I have a .csv file with the following data:
duration time | starting time | finish time
1,#1996-06-18 23:25:00#,#1996-06-18 23:26:00#
23,#1996-06-18 23:28:00#,#1996-06-18 23:51:00#
1,#1996-06-18 23:59:00#,#1996-06-19#
1,#1996-06-18 23:24:00#,#1996-06-18 23:25:00#
8,#1996-06-18 23:51:00#,#1996-06-18 23:59:00#
3,#1996-06-19#,#1996-06-19 00:03:00#
12,#1996-06-19 00:12:00#,#1996-06-19 00:24:00#
3,#1996-06-18 23:03:00#,#1996-06-18 23:06:00#
The bold lines have incomplete elements. My question is how can i complete the those elements(start time and finish time) using the duration and the other element, i.e sum the duration and the start time to obtain the finish time, in R (assuming that this data is in a data frame).
Basically, how do i add 00:00:00 to these rows (in a R data frame)?
You can use the lubridate package to convert the duration column to time objects. Using the dplyr package you can manipulate your data.frame as follows:
library(dplyr)
library(magrittr)
library(lubridate)
df <- data.frame(duration = c(1, 23, 1, 1, 8, 3),
start_time = c("1996-06-18 23:25:00",
"1996-06-18 23:28:00",
"1996-06-18 23:59:00",
"1996-06-18 23:24:00",
"1996-06-18 23:51:00",
"1996-06-19"),
end_time = c("1996-06-18 23:26:00",
"1996-06-18 23:51:00",
"1996-06-19",
"1996-06-18 23:25:00",
"1996-06-18 23:59:00",
"1996-06-19 00:03:00"))
df1 <- df %>%
mutate(start_time = as.POSIXct(start_time, format = "%Y-%m-%d %H:%M:%S"), duration = minutes(duration),
end_time = as.POSIXct(end_time, format = "%Y-%m-%d %H:%M:%S")) %>%
mutate(start_time = ifelse(is.na(start_time),
(end_time - duration),
start_time),
end_time = ifelse(is.na(end_time),
(start_time + duration),
end_time))
df1 %<>%
mutate(start_time = as.POSIXct(start_time, origin = "1970-01-01"),
end_time = as.POSIXct(end_time, origin = "1970-01-01"))
and then your output would look like this:
> df1
duration start_time end_time
1 1M 0S 1996-06-18 23:25:00 1996-06-18 23:26:00
2 23M 0S 1996-06-18 23:28:00 1996-06-18 23:51:00
3 1M 0S 1996-06-18 23:59:00 1996-06-19 00:00:00
4 1M 0S 1996-06-18 23:24:00 1996-06-18 23:25:00
5 8M 0S 1996-06-18 23:51:00 1996-06-18 23:59:00
6 3M 0S 1996-06-19 00:00:00 1996-06-19 00:03:00
Related
I have added a new column RIDE_LENGTH using mutate function as follows.
df2 <- mutate(df2, RIDE_LENGTH = (ENDED_AT - STARTED_AT)
ENDED AT & STARTED AT is in HH:MM:SS format, but my new column is showing the result in seconds only
example : 12:05:00 - 12:03:00 = 120 secs.
I need the answer to be in the same format as 00:02:00.
If anyone can tell me how to do that would be a great help.
You can use
library(lubridate)
RIDE_LENGTH <- seconds_to_period(RIDE_LENGTH)
There are a few ways in the lubridate package, depending on your desired output. Take your pick:
library(dplyr)
df <- data.frame(
STARTED_AT = as.POSIXct("2022-06-06 12:03:00 UTC"),
ENDED_AT = as.POSIXct("2022-06-06 12:05:00 UTC")
)
df |>
mutate(
RIDE_LENGTH_base = ENDED_AT - STARTED_AT,
RIDE_LENGTH_lubridate_difftime = lubridate::as.difftime(ENDED_AT - STARTED_AT),
RIDE_LENGTH_period = lubridate::as.period(ENDED_AT - STARTED_AT),
RIDE_LENGTH_duration = lubridate::as.duration(ENDED_AT - STARTED_AT)
)
# STARTED_AT ENDED_AT RIDE_LENGTH_base RIDE_LENGTH_lubridate_difftime RIDE_LENGTH_period RIDE_LENGTH_interval
# 1 2022-06-06 12:03:00 2022-06-06 12:05:00 2 mins 2 mins 2M 0S 120s (~2 minutes)
I have a large dataframe with one column with time and a second column with speed measurements (km/h). Here is an short example of the database:
df <- data.frame(time = as.POSIXct(c("2019-04-01 13:55:18", "2019-04-01 14:03:18",
"2019-04-01 14:14:18", "2019-04-01 14:26:55",
"2019-04-01 14:46:55", "2019-04-01 15:01:55")),
speed = c(4.5, 6, 3.2, 5, 4, 2))
Is there any way to do a new dataframe, which calculates the distance driven every 20 minutes, from 2019-04-01 14:00:00 to 2019-04-01 15:00:00? assuming that the speed changes are linear. I was trying to find solutions with integrals, but was not sure if it is the correct way to do it. Thanks for the help!
Here is a solution using a combination of zoo::na.approx and dplyr functions.
library(zoo)
library(dplyr)
seq = data.frame(time = seq(min(df$time),max(df$time), by = 'secs'))
df <- merge(seq,df,all.x=T)
df$speed <- na.approx(df$speed)
df %>%
filter(time >= "2019-04-01 14:00:00" & time < "2019-04-01 15:00:00") %>%
mutate(km = speed/3600) %>%
group_by(group = cut(time, breaks = "20 min")) %>%
summarise(distance = sum(km))
Which gives:
# A tibble: 3 x 2
group distance
<fct> <dbl>
1 2019-04-01 14:00:00 1.50
2 2019-04-01 14:20:00 1.54
3 2019-04-01 14:40:00 1.16
Explanation:
The first step is to create a sequence of time frames to compute the speed between two times points (seq). The sequence is then merged with the data frame and NAs are filled using na.approx.
Then, using dplyr verbs, the data frame is filtered, and the 20 minutes sequences are created using cut. The final distance is the sum of every 1-sec distance in the 20 minutes time frame.
I have a data frame with start and stop times for an experiment and I want to calculate the duration of each experiment (one line per experiment). Data frame:
start_t stop_t
7:35 7:48
23:50 00:15
11:22 12:06
I created a function to convert the time to POSIX format and calculate the duration, testing if start and stop crosses midnight:
TimeDiff <- function(t1,t2) {
if (as.numeric(as.POSIXct(paste("2016-01-01", t1))) > as.numeric(as.POSIXct(paste("2016-01-01", t2)))) {
t1n <- as.numeric(as.POSIXct(paste("2016-01-01", t1)))
t2n <- as.numeric(as.POSIXct(paste("2016-01-02", t2)))
}
if (as.numeric(as.POSIXct(paste("2016-01-01", t1))) < as.numeric(as.POSIXct(paste("2016-01-01", t2)))) {
t1n <- as.numeric(as.POSIXct(paste("2016-01-01", t1)))
t2n <- as.numeric(as.POSIXct(paste("2016-01-01", t2)))
}
#calculate time-difference in seconds
t2n - t1n
}
Then I wanted to apply this function to my data frame using either the 'mutate' function in 'dplyr' or an 'apply' function, e.g.:
mutate(df, dur = TimeDiff(start_t, stop_t))
But the result is that the 'dur' table is filled with just the same value. I ended up using a clunky for-loop to apply my function to the dataframe, but would want a more elegant solution. Help wanted!
Day can be incremented when the time stamp passes midnight. I am not sure if that is necessary to just to test if start and stop crosses midnight. Hope this helps!
df = data.frame(start_t = c("7:35", "23:50","11:22"), stop_t=c("7:48", "00:15", "12:06"), stringsAsFactors = F)
myfun = function(tvec1, tvec2, units_args="secs") {
tvec1_t = as.POSIXct(paste("2016-01-01", tvec1))
tvec2_t = as.POSIXct(paste("2016-01-01", tvec2))
time_diff = difftime(tvec2_t, tvec1_t, units = units_args)
return( time_diff )
}
# append new columns (base R)
df$time_diff = myfun(df$start_t, df$stop_t)
df$cross = ifelse(df$time_diff < 0, 1, 0)
output:
start_t stop_t time_diff cross
1 7:35 7:48 780 secs 0
2 23:50 00:15 -84900 secs 1
3 11:22 12:06 2640 secs 0
Since you don't have dates but only times, there is indeed the problem of experiments crossing midnight. Your function does not work, because it is not vectorized, i.e. it doesn't compute the difference for each element on its own.
The following works but is still not perfectly elegant:
If the start happened before the end, we simply subtract to get the duration.
If we cross midnight (the heuristic for this is not very stable), we calculate the difference until midnight and add the duration on the next day.
library(tidyverse)
diff_time <- function(start, end) {
case_when(start < end ~ end - start,
start > end ~ parse_time("23:59") - start + end + parse_time("0:01")
)
}
df %>%
mutate_all(parse_time) %>%
mutate(duration = diff_time(start_t, stop_t))
#> start_t stop_t duration
#> 1 07:35:00 07:48:00 780 secs
#> 2 23:50:00 00:15:00 1500 secs
#> 3 11:22:00 12:06:00 2640 secs
If you had dates, you could simply do:
df %>%
mutate(duration = stop_t - start_t)
Data
df <- read.table(text = "start_t stop_t
7:35 7:48
23:50 00:15
11:22 12:06", header = T)
The simplest way I can think of involves lubridate:
library(lubridate)
library(dplyr)
#make a fake df
df <- data.frame(start = c('7:35', '23:50', '11:22'), stop = c('7:48', '00:15', '12:06'), stringsAsFactors = FALSE)
#convert to lubridate minutes/seconds format, then subtract
df %>%
mutate(start = ms(start), stop = ms(stop)) %>%
mutate(dur= stop - start)
Output:
start stop dur
1 7M 35S 7M 48S 13S
2 23M 50S 15S -23M -35S
3 11M 22S 12M 6S 1M -16S
The problem with your circumstance is that the second line will confuse lubridate - it will show 23 hours and some minutes because it will assume all of these times are on the same day. You should probably add the day:
library(lubridate)
library(dplyr)
#make a fake df
df <- data.frame(start = c('2017/10/08 7:35', '2017/10/08 23:50', '2017/10/08 11:22'), stop = c('2017/10/08 7:48', '2017/10/09 00:15', '2017/10/08 12:06'), stringsAsFactors = FALSE)
#convert to lubridate minutes/seconds format, then subtract
df %>%
mutate(start = ymd_hm(start), stop = ymd_hm(stop)) %>%
mutate(dur= stop - start)
Output:
start stop dur
1 2017-10-08 07:35:00 2017-10-08 07:48:00 13 mins
2 2017-10-08 23:50:00 2017-10-09 00:15:00 25 mins
3 2017-10-08 11:22:00 2017-10-08 12:06:00 44 mins
I have a data frame with hour stamp and corresponding temperature measured. The measurements are taken at random intervals over time continuously. I would like to convert the hours to respective date-time and temperature measured. My data frame looks like this: (The measurement started at 20/05/2016)
Time, Temp
09.25,28
10.35,28.2
18.25,29
23.50,30
01.10,31
12.00,36
02.00,25
I would like to create a data.frame with respective date-time and Temp like below:
Time, Temp
2016-05-20 09:25,28
2016-05-20 10:35,28.2
2016-05-20 18:25,29
2016-05-20 23:50,30
2016-05-21 01:10,31
2016-05-21 12:00,36
2016-05-22 02:00,25
I am thankful for any comments and tips on the packages or functions in R, I can have a look to do this. Thanks for your time.
A possible solution in base R:
df$Time <- as.POSIXct(strptime(paste('2016-05-20', sprintf('%05.2f',df$Time)), format = '%Y-%m-%d %H.%M', tz = 'GMT'))
df$Time <- df$Time + cumsum(c(0,diff(df$Time)) < 0) * 86400 # 86400 = 60 * 60 * 24
which gives:
> df
Time Temp
1 2016-05-20 09:25:00 28.0
2 2016-05-20 10:35:00 28.2
3 2016-05-20 18:25:00 29.0
4 2016-05-20 23:50:00 30.0
5 2016-05-21 01:10:00 31.0
6 2016-05-21 12:00:00 36.0
7 2016-05-22 02:00:00 25.0
An alternative with data.table (off course you can also use cumsum with diff instead of rleid & shift):
setDT(df)[, Time := as.POSIXct(strptime(paste('2016-05-20', sprintf('%05.2f',Time)), format = '%Y-%m-%d %H.%M', tz = 'GMT')) +
(rleid(Time < shift(Time, fill = Time[1]))-1) * 86400]
Or with dplyr:
library(dplyr)
df %>%
mutate(Time = as.POSIXct(strptime(paste('2016-05-20',
sprintf('%05.2f',Time)),
format = '%Y-%m-%d %H.%M', tz = 'GMT')) +
cumsum(c(0,diff(Time)) < 0)*86400)
which will both give the same result.
Used data:
df <- read.table(text='Time, Temp
09.25,28
10.35,28.2
18.25,29
23.50,30
01.10,31
12.00,36
02.00,25', header=TRUE, sep=',')
You can use a custom date format combined with some code that detects when a new day begins (assuming the first measurement takes place earlier in the day than the last measurement of the previous day).
# starting day
start_date = "2016-05-20"
values=read.csv('values.txt', colClasses=c("character",NA))
last=c(0,values$Time[1:nrow(values)-1])
day=cumsum(values$Time<last)
Time = strptime(paste(start_date,values$Time), "%Y-%m-%d %H.%M")
Time = Time + day*86400
values$Time = Time
Here is an example of a subset data in .csv files. There are three columns with no header. The first column represents the date/time and the second column is load [kw] and the third column is 1= weekday, 0 = weekends/ holiday.
9/9/2010 3:00 153.94 1
9/9/2010 3:15 148.46 1
I would like to program in R, so that it selects the first and second column within time ranges from 10:00 to 20:00 for all weekdays (when the third column is 1) within a month of September and do not know what's the best and most efficient way to code.
code dt <- read.csv("file", header = F, sep=",")
#Select a column with weekday designation = 1, weekend or holiday = 0
y <- data.frame(dt[,3])
#Select a column with timestamps and loads
x <- data.frame(dt[,1:2])
t <- data.frame(dt[,1])
#convert timestamps into readable format
s <- strptime("9/1/2010 0:00", format="%m/%d/%Y %H:%M")
e <- strptime("9/30/2010 23:45", format="%m/%d/%Y %H:%M")
range <- seq(s,e, by = "min")
df <- data.frame(range)
OP ask for "best and efficient way to code" this without showing "inefficient code", so #Justin is right.
It's seems that the OP is new to R (and it's officially the summer of love) so I give it a try and I have a solution (not sure about efficiency..)
index <- c("9/9/2010 19:00", "9/9/2010 21:15", "10/9/2010 11:00", "3/10/2010 10:30")
index <- as.POSIXct(index, format = "%d/%m/%Y %H:%M")
set.seed(1)
Data <- data.frame(Date = index, load = rnorm(4, mean = 120, sd = 10), weeks = c(0, 1, 1, 1))
## Data
## Date load weeks
## 1 2010-09-09 19:00:00 113.74 0
## 2 2010-09-09 21:15:00 121.84 1
## 3 2010-09-10 11:00:00 111.64 1
## 4 2010-10-03 10:30:00 135.95 1
cond <- expression(format(Date, "%H:%M") < "20:00" &
format(Date, "%H:%M") > "10:00" &
weeks == 1 &
format(Date, "%m") == "09")
subset(Data, eval(cond))
## Date load weeks
## 3 2010-09-10 11:00:00 111.64 1