I'm trying to obtain the time difference between 2 timestamps in hours.
I have the data:
ID Lat Long Traffic Start_Time End_Time
1 -80.424 40.4242 54 2018-01-01 01:00 2018-01-01 01:10
2 -80.114 40.4131 30 2018-01-01 02:30 2018-01-01 02:40
3 -80.784 40.1142 12 2018-01-01 06:15 2018-01-01 07:20
I want to get the data like this
ID Lat Long Traffic Start_Time End_Time differ_hrs
1 -80.424 40.4242 54 2018-01-01 01:00 2018-01-01 01:10 00:50
2 -80.114 40.4131 30 2018-01-02 08:30 2018-01-02 08:40 01:10
3 -80.784 40.1142 12 2018-01-04 19:26 2018-01-04 20:11 01:15
I tried this code to capture the difference in hours:
df$differ_hrs<- difftime(df$End_Time, df$Start_Time, units = "hours")
However, it captures the difference like this:
ID Lat Long Traffic Start_Time End_Time differ_hrs
1 -80.424 40.4242 54 2018-01-01 01:00 2018-01-01 01:10 0.5
2 -80.114 40.4131 30 2018-01-02 08:30 2018-01-02 08:40 0.70
3 -80.784 40.1142 12 2018-01-04 19:26 2018-01-04 20:11 0.75
then I tried to set the difference in hours into format="%H%M" using the code:
df$differ_HHMM<- format(strptime(df$differ_hrs, format="%H%M"), format = "%H:%M")
But it produces all NAs.
So I decided to try a different way where I calculate the difference and set the format in the command itself adding "%H%M" like this:
df$differ_HHMM<- as.numeric(difftime(strptime(paste(df[,6]),"%Y-%m-%d %H:%M:%S"), strptime(paste(df[,5]),"%Y-%m-%d %H:%M:%S"),format="%H%M", units = "hours"))
but I keep getting the error message:
Error in difftime(strptime(paste(df[, 6]), "%Y-%m-%d %H:%M:%S"), strptime(paste(df[, :
unused argument (format = "%H:%M:%S")
Is there any way to calculate the time difference in %H:%M format?
I really appreciate your suggestions
The difference is a difftime class built on top of numeric. We could specify the units in difftime as seconds and use seconds_to_period from lubridate
library(lubridate)
df$differ_hrs<- as.numeric(difftime(df$End_Time, df$Start_Time,
units = "secs"))
out <- seconds_to_period(df$differ_hrs)
df$differ_HHMM <- sprintf('%02d:%02d', out#hour, out$minute)
NOTE: format works only on Date or Datetime class i.e. POSIXct, POSIXlt and not on numeric/difftime objects
data
df <- structure(list(ID = 1:3, Lat = c(-80.424, -80.114, -80.784),
Long = c(40.4242, 40.4131, 40.1142), Traffic = c(54L, 30L,
12L), Start_Time = structure(c(1514786400, 1514791800, 1514805300
), class = c("POSIXct", "POSIXt"), tzone = ""), End_Time = structure(c(1514787000,
1514792400, 1514809200), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA,
-3L), class = "data.frame")
Related
I have two tables, First table with columns - ID, Start_Date, End_Date Second table with columns - Day_of_Week, Start_Time, End_Time
ID Start_Date_Time End_Date_Time
1 ABC123 2019-01-05 16:00:00 2019-01-07 20:00:00
2 XYZ123 2019-01-06 05:00:00 2019-01-13 05:00:00
3 XYZ456 2019-01-08 19:00:00 2019-01-13 12:00:00
And
ID Day StartTime EndTime
1 ABC123 Saturday 13:00 18:00
2 XYZ123 Sunday 0:00 6:00
3 XYZ456 Tuesday 0:00 12:00
I need a Resultant column in the first table which captures the number of hours within the Start_Date and End_Date based on the condition in the second table. In this case the result should be
ID Start_Date End_Date Timeline_Hours
ABC123 01/05/2019 16:00 01/07/2019 20:00 2
XYZ123 01/06/2019 5:00 01/13/2019 5:00 6
XYZ456 01/08/2019 19:00 01/13/2019 12:00 0
For the first record: ABC123 - Number of hours withing the Start_Date and End_date based on the condition is 2 Hours.
Reason - Date starts from Staurday 16:00 (4PM) and ends on Monday 20:00 (8PM), Condition in the second table says Saturday 13:00 to 18:00 so overlap is 2 Hours ( from 16:00 to 18:00)
Similarly second one has duration of more than a week and overlap for the first week is 1 Hour (from 5:00 to 6:00) and for the second week it is 5 Hours (from 0:00 to 5:00)
For third one no overlap so 0 Hour.
Can this be done in R?
Thanks
Nagaraj
df1 <- structure(list(ID = c("ABC123", "XYZ123", "XYZ456"), Start_Date_Time = structure(c(1546675200,
1546722000, 1546945200), class = c("POSIXct", "POSIXt"), tzone = ""),
End_Date_Time = structure(c(1546862400, 1547326800, 1547352000
), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA,
-3L), class = "data.frame")
df2 <- structure(list(ID = c("ABC123", "XYZ123", "XYZ456"), Day = c("Saturday",
"Sunday", "Tuesday"), StartTime = c("13:00", "0:00", "0:00"),
EndTime = c("18:00", "6:00", "12:00")), row.names = c(NA,
-3L), class = "data.frame")
An option using data.table:
library(data.table)
setDT(df1)
setDT(df2)
fmt <- "%Y-%m-%d %H:%M"
#generate all hours with df2
DT <- df1[, {
x <- seq(min(as.IDate(Start_Date_Time)), max(as.IDate(End_Date_Time)), by="1 day")
.(Date=x, Day=weekdays(x))
}][
df2, on=.(Day), nomatch=0L]
hoursDT <- DT[, .(ID, END_HR=seq.POSIXt(as.POSIXct(paste(Date, StartTime), format=fmt) + 60*60,
as.POSIXct(paste(Date, EndTime), format=fmt),
by="1 hour")),
seq_len(nrow(DT))]
#count number of overlapping hours by joining the prev data.table with df1
df1[, Timeline_Hours :=
hoursDT[.SD, on=.(ID, END_HR>Start_Date_Time, END_HR<=End_Date_Time), by=.EACHI, .N]$N
]
output for df1:
ID Start_Date_Time End_Date_Time Timeline_Hours
1: ABC123 2019-01-05 16:00:00 2019-01-07 20:00:00 2
2: XYZ123 2019-01-06 05:00:00 2019-01-13 05:00:00 6
3: XYZ456 2019-01-08 19:00:00 2019-01-13 12:00:00 0
I'm using R and am trying to compute the number of hours someone slept. Currently, the bed time and wake times are reported in military time, so when I use difftime(), the interval between sleeping at 9pm (21:00) and waking up at 7AM (07:00) ends up being 14 hours, instead of 10 hours. Can someone help me figure out what I need to do so that it gives me the correct time difference?
Example data:
bedtime waketime
1 1899-12-31 01:00:00 1899-12-31 06:00:00
2 1899-12-31 21:00:00 1899-12-31 07:00:00
3 1899-12-31 22:00:00 1899-12-31 06:00:00
Script used:
difftime(PSQI$wakeup_3, PSQI$bedtime_1, units = "hours")
[1] 5.00 -14.00 -16.00
When what I would am looking for is
[1] 5.00 10.00 8.00
Thank you to anyone who can help!
Combining the comments above from #thelatemail and #Dave2e, we can do
with(df, ifelse(bedtime > waketime, waketime + 86400 - bedtime, waketime - bedtime))
#[1] 5 10 8
We add 86400 seconds (1 day) only if bedtime > waketime and then take the difference. Make sure columns bedtime and waketime are actual POSIXct class.
data
df <- structure(list(bedtime = structure(c(-2209096006, -2209024006,
-2209020406), class = c("POSIXct", "POSIXt"), tzone = ""), waketime =
structure(c(-2209078006,
-2209074406, -2209078006), class = c("POSIXct", "POSIXt"), tzone =
"")), row.names = c(NA,
-3L), class = "data.frame")
I'm trying to figure out the way of creating sequence of dates and time in this format: 2018-01-01 01:00 till 2018-03-30 01:00
for each Patient and fill the new empty value with random numbers.
My data look like :
Patients temperature
Patient1 37
Patient2 36
Patient3 35.4
I want to get the data looks like
Patients temperature Time
Patient1 37 2018-01-01 01:00
Patient2 36 2018-01-01 01:00
Patient3 35.4 2018-01-01 01:00
Patient1 NA 2018-01-01 02:00
Patient2 NA 2018-01-01 02:00
Patient3 NA 2018-01-01 02:00
Patient1 NA 2018-01-01 03:00
Patient2 NA 2018-01-01 03:00
Patient3 NA 2018-01-01 03:00
So the Time variable will be till 2018-03-30 01:00 and the temperature can be NA and then I generate random numbers but not repeating the same values of the temperature of each Patient.
I tried this commands but didn't work and I don't know how to assign the time to each Patient
Time <- seq (from=as.POSIXct("2018-1-1 01:00"), to=as.POSIXct("2018-3-30 01:00", tz="UTC"), by="hour")
And I tried too this command but I got error message:
dt = data.table(ID = Sensor7$StationID,Time = seq (from=as.POSIXct("2018-01-01 02:00"), to=as.POSIXct("2018-03-30 01:00",format = "%Y-%m-%d %H:%M",by="hour")))
But it gave me error message:
Error in seq.POSIXt(from = as.POSIXct("2018-01-01 00:00"), to = as.POSIXct("2018-03-30 23:00", :
exactly two of 'to', 'by' and 'length.out' / 'along.with' must be specified
Does anyone have any idea how to get the data in the format I'm looking for pleas?
You weren't too far off. Try this:
# I reproduce your data:
library(data.table)
data = data.table::fread(input =
"Patients,temperature
Patient1,37
Patient2,36
Patient3,35.4")
library(dplyr)
Time <- seq (from=as.POSIXct("2018-1-1 01:00"), to=as.POSIXct("2018-3-30 01:00", tz="UTC"), by="hour")
And this should do what you want:
data %>%
group_by(Patients) %>%
do({data.frame("temperature" = c(.data$temperature, rep(NA,length(Time) - nrow(.data))), Time)})
Here's one way:
dat = data.frame(Patients=paste0("Patients", 1:3), temperature=c(37,36,35.4))
Time = seq(as.POSIXct("2018-01-01 01:00"), as.POSIXct("2018-03-30 01:00"), by="hour")
new.data = data.frame(
Patient = rep(dat$Patients, each=length(Time)),
Time = rep(Time, length(dat$Patients))
)
I'm not sure how you want to generate the random values, but here's a generic method:
new.data$Random.Temperature = rnorm(nrow(new.data), 35, 1)
I have some data, and the Date column includes the time too. I am trying to get this data into xts format. I have tried below, but I get an error. Can anyone see anything wrong with this code? TIA
Date Open High Low Close
1 2017.01.30 07:00 1.25735 1.25761 1.25680 1.25698
2 2017.01.30 08:00 1.25697 1.25702 1.25615 1.25619
3 2017.01.30 09:00 1.25618 1.25669 1.25512 1.25533
4 2017.01.30 10:00 1.25536 1.25571 1.25093 1.25105
5 2017.01.30 11:00 1.25104 1.25301 1.25093 1.25262
6 2017.01.30 12:00 1.25260 1.25479 1.25229 1.25361
7 2017.01.30 13:00 1.25362 1.25417 1.25096 1.25177
8 2017.01.30 14:00 1.25177 1.25219 1.24900 1.25071
9 2017.01.30 15:00 1.25070 1.25307 1.24991 1.25238
10 2017.01.30 16:00 1.25238 1.25358 1.25075 1.25159
df = read.table(file = "GBPUSD60.csv", sep="," , header = TRUE)
dates = as.character(df$Date)
df$Date = NULL
Sept17 = xts(df, as.POSIXct(dates, format="%Y-%m-%d %H:%M"))
I have read in and formatted my data set like shown under.
library(xts)
#Read data from file
x <- read.csv("data.dat", header=F)
x[is.na(x)] <- c(0) #If empty fill in zero
#Construct data frames
rawdata.h <- data.frame(x[,2],x[,3],x[,4],x[,5],x[,6],x[,7],x[,8]) #Hourly data
rawdata.15min <- data.frame(x[,10]) #15 min data
#Convert time index to proper format
index.h <- as.POSIXct(strptime(x[,1], "%d.%m.%Y %H:%M"))
index.15min <- as.POSIXct(strptime(x[,9], "%d.%m.%Y %H:%M"))
#Set column names
names(rawdata.h) <- c("spot","RKup", "RKdown","RKcon","anm", "pp.stat","prod.h")
names(rawdata.15min) <- c("prod.15min")
#Convert data frames to time series objects
data.htemp <- xts(rawdata.h,order.by=index.h)
data.15mintemp <- xts(rawdata.15min,order.by=index.15min)
#Select desired subset period
data.h <- data.htemp["2013"]
data.15min <- data.15mintemp["2013"]
I want to be able to combine hourly data from data.h$prod.h with data, with 15 min resolution, from data.15min$prod.15min corresponding to the same hour.
An example would be to take the average of the hourly value at time 2013-12-01 00:00-01:00 with the last 15 minute value in that same hour, i.e. the 15 minute value from time 2013-12-01 00:45-01:00. I'm looking for a flexible way to do this with an arbitrary hour.
Any suggestions?
Edit: Just to clarify further: I want to do something like this:
N <- NROW(data.h$prod.h)
for (i in 1:N){
prod.average[i] <- mean(data.h$prod.h[i] + #INSERT CODE THAT FINDS LAST 15 MIN IN HOUR i )
}
I found a solution to my problem by converting the 15 minute data into hourly data using the very useful .index* function from the xts package like shown under.
prod.new <- data.15min$prod.15min[.indexmin(data.15min$prod.15min) %in% c(45:59)]
This creates a new time series with only the values occuring in the 45-59 minute interval each hour.
For those curious my data looked like this:
Original hourly series:
> data.h$prod.h[1:4]
2013-01-01 00:00:00 19.744
2013-01-01 01:00:00 27.866
2013-01-01 02:00:00 26.227
2013-01-01 03:00:00 16.013
Original 15 minute series:
> data.15min$prod.15min[1:4]
2013-09-30 00:00:00 16.4251
2013-09-30 00:15:00 18.4495
2013-09-30 00:30:00 7.2125
2013-09-30 00:45:00 12.1913
2013-09-30 01:00:00 12.4606
2013-09-30 01:15:00 12.7299
2013-09-30 01:30:00 12.9992
2013-09-30 01:45:00 26.7522
New series with only the last 15 minutes in each hour:
> prod.new[1:4]
2013-09-30 00:45:00 12.1913
2013-09-30 01:45:00 26.7522
2013-09-30 02:45:00 5.0332
2013-09-30 03:45:00 2.6974
Short answer
df %>%
group_by(t = cut(time, "30 min")) %>%
summarise(v = mean(value))
Long answer
Since, you want to compress the 15 minutes time series to a smaller resolution (30 minutes), you should use dplyr package or any other package that computes the "group by" concept.
For instance:
s = seq(as.POSIXct("2017-01-01"), as.POSIXct("2017-01-02"), "15 min")
df = data.frame(time = s, value=1:97)
df is a time series with 97 rows and two columns.
head(df)
time value
1 2017-01-01 00:00:00 1
2 2017-01-01 00:15:00 2
3 2017-01-01 00:30:00 3
4 2017-01-01 00:45:00 4
5 2017-01-01 01:00:00 5
6 2017-01-01 01:15:00 6
The cut.POSIXt, group_by and summarise functions do the work:
df %>%
group_by(t = cut(time, "30 min")) %>%
summarise(v = mean(value))
t v
1 2017-01-01 00:00:00 1.5
2 2017-01-01 00:30:00 3.5
3 2017-01-01 01:00:00 5.5
4 2017-01-01 01:30:00 7.5
5 2017-01-01 02:00:00 9.5
6 2017-01-01 02:30:00 11.5
A more robust way is to convert 15 minutes values into hourly values by taking average. Then do whatever operation you want to.
### 15 Minutes Data
min15 <- structure(list(V1 = structure(1:8, .Label = c("2013-01-01 00:00:00",
"2013-01-01 00:15:00", "2013-01-01 00:30:00", "2013-01-01 00:45:00",
"2013-01-01 01:00:00", "2013-01-01 01:15:00", "2013-01-01 01:30:00",
"2013-01-01 01:45:00"), class = "factor"), V2 = c(16.4251, 18.4495,
7.2125, 12.1913, 12.4606, 12.7299, 12.9992, 26.7522)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -8L))
min15
### Hourly Data
hourly <- structure(list(V1 = structure(1:4, .Label = c("2013-01-01 00:00:00",
"2013-01-01 01:00:00", "2013-01-01 02:00:00", "2013-01-01 03:00:00"
), class = "factor"), V2 = c(19.744, 27.866, 26.227, 16.013)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -4L))
hourly
### Convert 15min data into hourly data by taking average of 4 values
min15$V1 <- as.POSIXct(min15$V1,origin="1970-01-01 0:0:0")
min15 <- aggregate(. ~ cut(min15$V1,"60 min"),min15[setdiff(names(min15), "V1")],mean)
min15
names(min15) <- c("time","min15")
names(hourly) <- c("time","hourly")
### merge the corresponding values
combined <- merge(hourly,min15)
### average of hourly and 15min values
rowMeans(combined[,2:3])