I have a date and time column which follows 24 hour format. Now I want it to shift it by 6 hours, such that 6 am of the current day becomes 00:00 and the day completes on 6 am of the following day. In excel, if we subtract 0.25 from the date column, it directly shifts the dates by 6 hours. But similar functionality doesn't seem to work in R. How does one achieve this in R?
You should provide more information on your question, like the data you're using.
To replicate that with R, you could use the lubridate package :
library(lubridate)
new_time <- time - hms("06:00:00")
Hope this helps
A solution using base R. By default arithmetic operations add seconds, so:
now <- Sys.time() #gets the current time
now
"2016-04-22 09:52:21 CEST"
now + 6*3600
"2016-04-22 15:52:21 CEST"
With your data, you can try, around strptime:
df <- read.table(text="
DateTime
2/10/2016 19:18
2/10/2016 19:15
2/10/2016 19:12
2/10/2016 19:09
2/10/2016 19:06
2/10/2016 19:03", sep=";", h=T)
df
DateTime
1 2/10/2016 19:18
2 2/10/2016 19:15
3 2/10/2016 19:12
4 2/10/2016 19:09
5 2/10/2016 19:06
6 2/10/2016 19:03
df$NewTime <- strptime(as.character(df$DateTime), format="%d/%m/%Y %H:%M") 6*3600
df
DateTime NewTime
1 2/10/2016 19:18 2016-10-03 01:18:00
2 2/10/2016 19:15 2016-10-03 01:15:00
3 2/10/2016 19:12 2016-10-03 01:12:00
4 2/10/2016 19:09 2016-10-03 01:09:00
5 2/10/2016 19:06 2016-10-03 01:06:00
6 2/10/2016 19:03 2016-10-03 01:03:00
You could remove the as.character step with stringsAsFactors = FALSE in read.table.
Does it solve your problem?
Related
I have date and time as separate columns, which i combined into a single column using library(lubridate)
Now i want to create a new column that would calculate the elapsed time between two consecutive rows for each unique ID
I tried diff, however the error i am getting is that the new column has +1 rows compared to original data set
s1$DT<-with(s1, mdy(Date.of.Collection) + hm(MILITARY.TIME))#this worked - #needs the library lubridate
s1$ElapsedTime<-difff(s1$DT)
units(s1$ElapsedTime)<-"hours"
Subject.ID time DT Time elapsed
1 Dose 8/1/2018 8:15 0
1 time point1 8/1/2018 9:56 0.070138889
1 time point2 8/2/2018 9:56 1.070138889
2 Dose 9/4/2018 10:50 0
2 time point1 9/11/2018 11:00 7.006944444
3 Dose 10/1/2018 10:20 0
3 time point1 10/2/2018 14:22 1.168055556
3 time point2 10/3/2018 12:15 2.079861111
From your comment, you don't need a "diff"; in conventional R-speak, a "diff" would be T1-T0, T2-T1, T3-T2, ..., Tn - Tn-1.
For you, one of these will work to give you T1,2,...,n - T0.
Base R
do.call(
rbind,
by(patients, patients$Subject.ID, function(x) {
x$elapsed <- x$realDT - x$realDT[1]
units(x$elapsed) <- "hours"
x
})
)
# Subject.ID time1 DT Time elapsed realDT
# 1.1 1 Dose 8/1/2018 8:15 0.000000 hours 2018-08-01 08:15:00
# 1.2 1 time_point1 8/1/2018 9:56 1.683333 hours 2018-08-01 09:56:00
# 1.3 1 time_point2 8/2/2018 9:56 25.683333 hours 2018-08-02 09:56:00
# 2.4 2 Dose 9/4/2018 10:50 0.000000 hours 2018-09-04 10:50:00
# 2.5 2 time_point1 9/11/2018 11:00 168.166667 hours 2018-09-11 11:00:00
# 3.6 3 Dose 10/1/2018 10:20 0.000000 hours 2018-10-01 10:20:00
# 3.7 3 time_point1 10/2/2018 14:22 28.033333 hours 2018-10-02 14:22:00
# 3.8 3 time_point2 10/3/2018 12:15 49.916667 hours 2018-10-03 12:15:00
dplyr
library(dplyr)
patients %>%
group_by(Subject.ID) %>%
mutate(elapsed = `units<-`(realDT - realDT[1], "hours")) %>%
ungroup()
data.table
library(data.table)
patDT <- copy(patients)
setDT(patDT)
patDT[, elapsed := `units<-`(realDT - realDT[1], "hours"), by = "Subject.ID"]
Notes:
The "hours" in the $elapsed column is just an artifact of dealing with a time-difference thing, it should not affect most operations. To get rid of it, make sure you're in the right units ("hours", "secs", ..., see ?units) and use as.numeric.
The only reasons I used as.POSIXct as above are that I'm not a lubridate user, and the data as provided is not in a time format. You shouldn't need it if your Time is a proper time format, in which case you'd use that field instead of my hacky realDT.
On similar lines, if you do calculate realDT and use it, you really don't need both realDT and the pair of DT and Time.
The data I used:
patients <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
Subject.ID time1 DT Time elapsed
1 Dose 8/1/2018 8:15 0
1 time_point1 8/1/2018 9:56 0.070138889
1 time_point2 8/2/2018 9:56 1.070138889
2 Dose 9/4/2018 10:50 0
2 time_point1 9/11/2018 11:00 7.006944444
3 Dose 10/1/2018 10:20 0
3 time_point1 10/2/2018 14:22 1.168055556
3 time_point2 10/3/2018 12:15 2.079861111")
# this is necessary for me because DT/Time here are not POSIXt (they're just strings)
patients$realDT <- as.POSIXct(paste(patients$DT, patients$Time), format = "%m/%d/%Y %H:%M")
My question is about time series data.
Suppose I have one file, named as P1 with column Time.Stamp and Value. Data table is given below:
Time.Stamp
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:09
01/01/2017 19:09
Value
12
24
45
56
78
76
34
65
87
I have another separated file, Named as P2 which has two columns , “Transaction from” and “transaction to” . This has the following columns:
Transaction from
01/01/2017 19:00
01/01/2017 19:15
02/01/2017 08:45
02/01/2017 09:00
02/01/2017 09:15
02/01/2017 09:30
03/01/2017 18:00
03/01/2017 18:15
03/01/2017 23:45
04/01/2017 00:15
04/01/2017 01:45
transaction to
01/01/2017 19:15
01/01/2017 19:30
02/01/2017 09:00
02/01/2017 09:15
02/01/2017 09:30
02/01/2017 09:45
03/01/2017 18:15
03/01/2017 18:30
04/01/2017 00:00
04/01/2017 00:30
04/01/2017 02:00
Now I want to search in R, which “Time.Stamp” from file P1 are belongs to the duration of “Transaction from” to “transaction to” of file P2. If any “Time.Stamp” is in the range of mentioned two columns of P2 then the associated value with Time.stamp will be aggregated. The length of columns of file P1 and file P2 is not equal. Length of P1 is much more long than length of P2.
It will be very helpful, if any one can find a solution in R.
This is a possible duplication of How to perform join over date ranges using data.table? Assuming that P1 & P2 are data frames and dates are POSIXct at the beginning, here is the livesaver join provided by data.table:
library(data.table)
setDT(P1)
setDT(P2)
P1[ , dummy := Time.Stamp]
setkey(P2, Transaction.from, transaction.to)
dt <- foverlaps(
P1,
P2,
by.x = c("Time.Stamp", "dummy"),
# mult = "first"/mult = "first" will only choose first/last match
nomatch = 0L
)[ , dummy := NULL]
# you can run ?data.table::foverlaps for the documentation
Please refer to this great blog post for a step-by-step explanation and other possible answers.
After this point you can simply:
library(dplyr)
dt %>%
group_by(Transaction.from) %>%
mutate(total = sum(value))
Please note that this solution may seem long for the simple aggregation you asked. However, it will come very handy if you need to merge the data frames and conduct more complex analysis.
First, convert all date to as.POSIXct(x,format = "%d/%m/%Y %H:%M"). Then look if each elements of p1$Time.Stamp is in any period of p2[,1] to p2[,2] by following function , then aggregate:
isitthere<- function(x,from=p2$`Transaction from`,to=p2$`transaction to`){
any(x >=from & x<= to)
}
Apply the function to all p1$Time.Stamp:
index<-sapply(p1$Time.Stamp, isitthere,from=p2$`Transaction from`,to=p2$`transaction to`)
index
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Now aggregate:
sum(p1$Value[index])
[1] 477
I am not clear about what is to be aggregated by what but assuming that DF1 and DF2 are as defined in the Note at the end then this will, for each row in DF2, look up zero or more rows in DF1 and then sum all Value for those rows having the same Transaction.from and Transaction.to.
library(sqldf)
sqldf("select [Transaction.from], [Transaction.to], sum(Value) as Value
from DF2
left join DF1 on [Time.Stamp] between [Transaction.from] and [Transaction.to]
group by [Transaction.from], [Transaction.to]")
giving:
Transaction.from Transaction.to Value
1 2017-01-01 19:00:00 2017-01-01 19:15:00 477
2 2017-01-01 19:15:00 2017-01-01 19:30:00 NA
3 2017-02-01 08:45:00 2017-02-01 09:00:00 NA
4 2017-02-01 09:00:00 2017-02-01 09:15:00 NA
5 2017-02-01 09:15:00 2017-02-01 09:30:00 NA
6 2017-02-01 09:30:00 2017-02-01 09:45:00 NA
7 2017-03-01 18:00:00 2017-03-01 18:15:00 NA
8 2017-03-01 18:15:00 2017-03-01 18:30:00 NA
9 2017-03-01 23:45:00 2017-04-01 00:00:00 NA
10 2017-04-01 00:15:00 2017-04-01 00:30:00 NA
11 2017-04-01 01:45:00 2017-04-01 02:00:00 NA
Note
Lines1 <- "
Time.Stamp,Value
01/01/2017 19:08,12
01/01/2017 19:08,24
01/01/2017 19:08,45
01/01/2017 19:08,56
01/01/2017 19:08,78
01/01/2017 19:08,76
01/01/2017 19:08,34
01/01/2017 19:09,65
01/01/2017 19:09,87
"
DF1 <- read.csv(text = Lines1)
fmt <- "%m/%d/%Y %H:%M"
DF1 <- transform(DF1, Time.Stamp = as.POSIXct(Time.Stamp, format = fmt))
Lines2 <- "
Transaction.from,Transaction.to
01/01/2017 19:00,01/01/2017 19:15
01/01/2017 19:15,01/01/2017 19:30
02/01/2017 08:45,02/01/2017 09:00
02/01/2017 09:00,02/01/2017 09:15
02/01/2017 09:15,02/01/2017 09:30
02/01/2017 09:30,02/01/2017 09:45
03/01/2017 18:00,03/01/2017 18:15
03/01/2017 18:15,03/01/2017 18:30
03/01/2017 23:45,04/01/2017 00:00
04/01/2017 00:15,04/01/2017 00:30
04/01/2017 01:45,04/01/2017 02:00
"
DF2 <- read.csv(text = Lines2)
DF2 <- transform(DF2, Transaction.from = as.POSIXct(Transaction.from, format = fmt),
Transaction.to = as.POSIXct(Transaction.to, format = fmt))
This question already has an answer here:
Convert factor to date class for multiple columns
(1 answer)
Closed 5 years ago.
i have a dataframe containing 250 columns of dates and in time in character format except for column 1 which contains Employee ID id. How can i convert all the columns except 1st column to date format.
1 1/5/2015 17:20 1/6/2015 17:19 1/7/2015 16:34 1/8/2015 17:08
2 1/2/2015 18:22 1/5/2015 17:48 NA 1/7/2015 17:09
3 1/2/2015 16:59 1/5/2015 17:06 1/6/2015 16:38 1/7/2015 16:33
4 1/2/2015 17:25 1/5/2015 17:14 1/6/2015 17:07 1/7/2015 16:32
5 1/2/2015 18:31 1/5/2015 17:49 1/6/2015 17:26 1/7/2015 17:37
6 1/2/2015 20:29 1/5/2015 20:57 1/6/2015 21:06 1/7/2015 20:36
Above date and in time of employee are in character format.
Tried doing
parse_date_time(df[,-1],"ymd_HMS") and parse_date_time(df[,2:250],"ymd_HMS")
but the same is not working. However while specifying only one column the syntax is working. Practically to do for 250 codes by individually specifying each columns is bad coding.
Apply strtime function to turn your character values into datetime values.
df[,2:250] <- as.data.frame(lapply(df[,2:250], strptime, format="%Y-%m-%d %H:%M:%S"))
The df[,2:250] will take only the columns you are interested in.
The format format="%Y-%m-%d %H:%M:%S" describes the format of your character entries. You can see what the letters mean here.
I am trying to replicate something like this with a custom function but I am getting errors. I have the following data frame
> dd
datetimeofdeath injurydatetime
1 2/10/05 17:30
2 2/13/05 19:15
3 2/15/05 1:10
4 2/24/05 21:00 2/16/05 20:36
5 3/11/05 0:45
6 3/19/05 23:05
7 3/19/05 23:13
8 3/23/05 20:51
9 3/31/05 11:30
10 4/9/05 3:07
The typeof these is integer but for some reason they have levels as if they were factors. This could be the root of my problem but I am not sure.
> typeof(dd$datetimeofdeath)
[1] "integer"
> typeof(dd$injurydatetime)
[1] "integer"
> dd$injurydatetime
[1] 2/10/05 17:30 2/13/05 19:15 2/15/05 1:10 2/16/05 20:36 3/11/05 0:45 3/19/05 23:05 3/19/05 23:13 3/23/05 20:51 3/31/05 11:30
[10] 4/9/05 3:07
549 Levels: 1/1/07 18:52 1/1/07 20:51 1/1/08 17:55 1/1/11 15:25 1/1/12 0:22 1/1/12 22:58 1/11/06 23:50 1/11/07 6:26 ... 9/9/10 8:15
Now I would like to apply the following function rowwise()
library(lubridate)
library(dplyr)
get_time_alive = function(datetimeofdeath, injurydatetime)
{
if(as.character(datetimeofdeath) == "" | as.character(injurydatetime) == "") return(NA)
time_of_death = parse_date_time(as.character(datetimeofdeath), "%m/%d/%y %H:%M")
time_of_injury = parse_date_time(as.character(injurydatetime), "%m/%d/%y %H:%M")
time_alive = as.duration(new_interval(time_of_injury,time_of_death))
time_alive_hours = as.numeric(time_alive) / (60*60)
return(time_alive_hours)
}
This works on individual rows, but not when I do the operation rowwise.
> get_time_alive(dd$datetimeofdeath[1], dd$injurydatetime[1])
[1] NA
> get_time_alive(dd$datetimeofdeath[4], dd$injurydatetime[4])
[1] 192.4
> dd = dd %>% rowwise() %>% dplyr::mutate(time_alive_hours=get_time_alive(datetimeofdeath, injurydatetime))
There were 20 warnings (use warnings() to see them)
> dd
Source: local data frame [10 x 3]
Groups:
datetimeofdeath injurydatetime time_alive_hours
1 2/10/05 17:30 NA
2 2/13/05 19:15 NA
3 2/15/05 1:10 NA
4 2/24/05 21:00 2/16/05 20:36 NA
5 3/11/05 0:45 NA
6 3/19/05 23:05 NA
7 3/19/05 23:13 NA
8 3/23/05 20:51 NA
9 3/31/05 11:30 NA
10 4/9/05 3:07 NA
As you can see the fourth element is NA even though when I applied my custom function to it by itself I got 192.4. Why is my custom function failing here?
I think you can simplify your code a lot and just use something like this:
dd %>%
mutate_each(funs(as.POSIXct(as.character(.), format = "%m/%d/%y %H:%M"))) %>%
mutate(time_alive = datetimeofdeath - injurydatetime)
# datetimeofdeath injurydatetime time_alive
#1 <NA> 2005-02-15 01:10:00 NA days
#2 2005-02-24 21:00:00 2005-02-16 20:36:00 8.016667 days
#3 <NA> 2005-03-11 00:45:00 NA days
Side notes:
I shortened your input data, because it's not easy to copy (I only took those three rows that you also see in my answer)
If you want the "time_alive" formatted in hours, just use mutate(time_alive = (datetimeofdeath - injurydatetime)*24) in the last mutate.
If you use this code, there's no need for rowwise() - which should also make it faster, I guess
I am quite new to R and have been struggling with trying to convert my data and could use some much needed help.
I have a dataframe which is approx. 70,000*2. This data covers a whole year (52 weeks/365 days). A portion of it looks like this:
Create.Date.Time Ticket.ID
1 2013-06-01 12:59:00 INCIDENT684790
2 2013-06-02 07:56:00 SERVICE684793
3 2013-06-02 09:39:00 SERVICE684794
4 2013-06-02 14:14:00 SERVICE684796
5 2013-06-02 17:20:00 SERVICE684797
6 2013-06-03 07:20:00 SERVICE684799
7 2013-06-03 08:02:00 SERVICE684839
8 2013-06-03 08:04:00 SERVICE684841
9 2013-06-03 08:04:00 SERVICE684842
10 2013-06-03 08:08:00 SERVICE684843
I am trying to get the number of tickets in every hour of the week (that is, hour 1 to hour 168) for each week. Hour 1 would start on Monday at 00.00, and hour 168 would be Sunday 23.00-23.59. This would be repeated for each week. I want to use the Create.Date.Time data to calculate the hour of the week the ticket is in, say for:
2013-06-01 12:59:00 INCIDENT684790 - hour 133,
2013-06-03 08:08:00 SERVICE684843 - hour 9
I am then going to do averages for each hour and plot those. I am completely at a loss as to where to start. Could someone please point me to the right direction?
Before addressing the plotting aspect of your question, is this the format of data you are trying to get? This uses the package lubridate which you might have to install (install.packages("lubridate",dependencies=TRUE)).
library(lubridate)
##
Events <- paste(
sample(c("INCIDENT","SERVICE"),20000,replace=TRUE),
sample(600000:900000,20000)
)
t0 <- as.POSIXct(
"2013-01-01 00:00:00",
format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")
Dates <- sort(t0 + sample(0:(3600*24*365-1),20000))
Weeks <- week(Dates)
wDay <- wday(Dates,label=TRUE)
Hour <- hour(Dates)
##
hourShift <- function(time,wday){
hShift <- sapply(wday, function(X){
if(X=="Mon"){
0
} else if(X=="Tues"){
24*1
} else if(X=="Wed"){
24*2
} else if(X=="Thurs"){
24*3
} else if(X=="Fri"){
24*4
} else if(X=="Sat"){
24*5
} else {
24*6
}
})
##
tOut <- hour(time) + hShift + 1
return(tOut)
}
##
weekHour <- hourShift(time=Dates,wday=wDay)
##
Data <- data.frame(
Event=Events,
Timestamp=Dates,
Week=Weeks,
wDay=wDay,
dayHour=Hour,
weekHour=weekHour,
stringsAsFactors=FALSE)
##
This gives you:
> head(Data)
Event Timestamp Week wDay dayHour weekHour
1 SERVICE 783405 2013-01-01 00:13:55 1 Tues 0 25
2 INCIDENT 860015 2013-01-01 01:06:41 1 Tues 1 26
3 INCIDENT 808309 2013-01-01 01:10:05 1 Tues 1 26
4 INCIDENT 835509 2013-01-01 01:21:44 1 Tues 1 26
5 SERVICE 769239 2013-01-01 02:04:59 1 Tues 2 27
6 SERVICE 762269 2013-01-01 02:07:41 1 Tues 2 27