How to calculate daily fluid infusion volume with variable infusion rates - r

Working in R, I need to calculate daily infusion volume (mL) given a variable infusion rate (mL/hour).
My dataframe has two columns: date (year, month, day, hours, mins, sec) when the infusion rate was changed, and the new infusion rate (ml/hr). From these data I have calculated cumulative infusion volume for the entire study (~ 3 weeks duration). I now need to calculate infusion volume for every 24 hours, midnight to midnight. The first and last study days are less than 24 hours duration and are excluded.
I don't know how to approach my problem with infusion rates spanning across 24 hour time periods at midnight.
One thought was to generate a new data frame consisting of time in secs (from zero to end of study) and volume infused per second, then sum infusion volume every day. This of course will generate a large (unnecessary) dataframe (>1 million rows).
I am looking for direction on how to approach in R.
No code to share at this time. My dataframe is shared:https://drive.google.com/file/d/1YfZkuOStOxWIXrxklWEo1r46hjFQPIXM/view
DF <- structure(list(`date&time` = structure(c(1519043251, 1519047111,
1519049877, 1519050201, 1519053454, 1519054180, 1519060742, 1519062334,
1519083584, 1519108892, 1519114732, 1519118888, 1519127198, 1519140960,
1519142031, 1519150508, 1519161027, 1519167167, 1519206508, 1519206877,
1519222879, 1519278875, 1519290863, 1519293411, 1519314864, 1519317665,
1519334695, 1519364934, 1519364996, 1519378625, 1519384577, 1519428049,
1519495090, 1519541667, 1519544091, 1519551993, 1519594678, 1519626216,
1519650059, 1519658045, 1519712871, 1519722853, 1519726863, 1519744270,
1519786071, 1519787755, 1519788820, 1519789685, 1519791798, 1519801303,
1519801380, 1519809813, 1519815924, 1519826260, 1519830433, 1519833629,
1519841284, 1519857415, 1519885051, 1519885120, 1519885141, 1519887091,
1519939049, 1519939482, 1519945740, 1519971397, 1519975527, 1519987363,
1519988481, 1520004464, 1520033974, 1520093329, 1520179994, 1520204550,
1520233073, 1520237983, 1520238103, 1520241519, 1520241904, 1520263216,
1520290670, 1520349278, 1520370509, 1520406514, 1520436434, 1520447318,
1520456518, 1520461383, 1520501027, 1520522600, 1520542062, 1520590191,
1520618693, 1520621059, 1520626341, 1520627226, 1520630596, 1520637370,
1520664044, 1520676143, 1520689466, 1520717079, 1520724147, 1520754787,
1520788241, 1520806426, 1520818840, 1520829807, 1520839843, 1520839936,
1520891100, 1520897458, 1520921676, 1520933752), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), `infusion rate` = c(25.75, 30.75,
25.75, 25.81, 25.81, 25.75, 25.65, 25.65, 27.55, 18.47, 18.25,
16.25, 15.25, 13.25, 13.25, 15.25, 16.25, 15.25, 15.45, 12.45,
12.25, 12.45, 11.45, 11.5, 11.57, 13.57, 11.57, 10.57, 10.55,
11.55, 13.55, 13.52, 13.56, 13.64, 13.7, 13.67, 13.67, 13.65,
14.65, 14.61, 14.67, 14.69, 13.69, 13.67, 16.67, 21.67, 24.67,
29.67, 34.67, 29.67, 29.65, 24.65, 22.65, 19.65, 19.65, 17.65,
14.65, 14.63, 14.65, 15.65, 14.65, 15.65, 16.65, 15.65, 15.68,
15.71, 15.74, 15.81, 15.92, 15.89, 15.9, 15.94, 15.93, 14.94,
15.92, 16.03, 15.03, 15, 15.02, 14.96, 14.91, 14.93, 14.94, 14.94,
14.91, 14.92, 14.92, 14.92, 14.94, 14.95, 15.95, 14.95, 16.95,
19.95, 22.95, 25.95, 26.95, 26.93, 26.89, 23.89, 20.89, 18.89,
18.87, 16.87, 15.87, 15.87, 14.87, 17.87, 16.87, 16.98, 17.98,
16.98, 15.98, 0)), row.names = 2:115, class = "data.frame")
I need the output to be two columns of data; time in days and daily infusion volume.

One possible solution is to use the foverlaps() function from the data.table package. foverlaps() finds all overlapping intervals (ranges, periods) by an overlap join:
library(data.table)
# coerce to data.table
setDT(DF)
# rename column names (syntactically correct)
setnames(DF, names(DF) %>% make.names())
DF
# create intervals (ranges) of infusion periods
DF_ranges <- DF[, .(start = head(date.time, -1L),
end = tail(date.time, -1L),
inf.rate = head(infusion.rate, -1L))]
setkey(DF_ranges, start, end)
# create sequence of calendar days (starting at midnight)
day_seq <- DF[, seq(lubridate::floor_date(min(date.time), "day"),
max(date.time), "1 day")]
# create intervals of days (from midnight to midnight)
day_ranges <- data.table(start = day_seq, end = day_seq + as.difftime(1, units = "days"))
# find all overlapping intervals (overlap join )
ovl <- foverlaps(day_ranges, DF_ranges)
# compute duration of infusion periods within each day
ovl[, inf.hours := difftime(pmin(end, i.end), pmax(start, i.start), units = "hours")]
# compute infusion volume for each period
ovl[, inf.vol := inf.rate * as.double(inf.hours)]
# aggregate by day
ovl[, .(inf.vol.per.day = sum(inf.vol)), by = .(day = as.Date(i.start))][
# drop first and last day
-c(1L, .N)]
day inf.vol.per.day
1: 2018-02-20 455.7107
2: 2018-02-21 324.6403
3: 2018-02-22 293.5880
4: 2018-02-23 298.9512
5: 2018-02-24 324.7212
6: 2018-02-25 327.3658
7: 2018-02-26 338.3609
8: 2018-02-27 338.1620
9: 2018-02-28 507.9508
10: 2018-03-01 368.7672
11: 2018-03-02 379.4539
12: 2018-03-03 381.9141
13: 2018-03-04 381.5335
14: 2018-03-05 360.6198
15: 2018-03-06 358.0437
16: 2018-03-07 358.3588
17: 2018-03-08 361.6632
18: 2018-03-09 421.2107
19: 2018-03-10 567.7771
20: 2018-03-11 413.8286
21: 2018-03-12 403.4742
day inf.vol.per.day
The intermediate results are
DF_ranges
start end inf.rate
1: 2018-02-19 12:27:31 2018-02-19 13:31:51 25.75
2: 2018-02-19 13:31:51 2018-02-19 14:17:57 30.75
3: 2018-02-19 14:17:57 2018-02-19 14:23:21 25.75
4: 2018-02-19 14:23:21 2018-02-19 15:17:34 25.81
5: 2018-02-19 15:17:34 2018-02-19 15:29:40 25.81
---
109: 2018-03-12 07:30:43 2018-03-12 07:32:16 16.87
110: 2018-03-12 07:32:16 2018-03-12 21:45:00 16.98
111: 2018-03-12 21:45:00 2018-03-12 23:30:58 17.98
112: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98
113: 2018-03-13 06:14:36 2018-03-13 09:35:52 15.98
day_ranges
start end
1: 2018-02-19 2018-02-20
2: 2018-02-20 2018-02-21
3: 2018-02-21 2018-02-22
4: 2018-02-22 2018-02-23
5: 2018-02-23 2018-02-24
6: 2018-02-24 2018-02-25
7: 2018-02-25 2018-02-26
8: 2018-02-26 2018-02-27
9: 2018-02-27 2018-02-28
10: 2018-02-28 2018-03-01
11: 2018-03-01 2018-03-02
12: 2018-03-02 2018-03-03
13: 2018-03-03 2018-03-04
14: 2018-03-04 2018-03-05
15: 2018-03-05 2018-03-06
16: 2018-03-06 2018-03-07
17: 2018-03-07 2018-03-08
18: 2018-03-08 2018-03-09
19: 2018-03-09 2018-03-10
20: 2018-03-10 2018-03-11
21: 2018-03-11 2018-03-12
22: 2018-03-12 2018-03-13
23: 2018-03-13 2018-03-14
start end
foverlaps(day_ranges, DF_ranges)
start end inf.rate i.start i.end
1: 2018-02-19 12:27:31 2018-02-19 13:31:51 25.75 2018-02-19 2018-02-20
2: 2018-02-19 13:31:51 2018-02-19 14:17:57 30.75 2018-02-19 2018-02-20
3: 2018-02-19 14:17:57 2018-02-19 14:23:21 25.75 2018-02-19 2018-02-20
4: 2018-02-19 14:23:21 2018-02-19 15:17:34 25.81 2018-02-19 2018-02-20
5: 2018-02-19 15:17:34 2018-02-19 15:29:40 25.81 2018-02-19 2018-02-20
---
131: 2018-03-12 07:32:16 2018-03-12 21:45:00 16.98 2018-03-12 2018-03-13
132: 2018-03-12 21:45:00 2018-03-12 23:30:58 17.98 2018-03-12 2018-03-13
133: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98 2018-03-12 2018-03-13
134: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98 2018-03-13 2018-03-14
135: 2018-03-13 06:14:36 2018-03-13 09:35:52 15.98 2018-03-13 2018-03-14
ovl
start end inf.rate i.start i.end inf.hours inf.vol
1: 2018-02-19 12:27:31 2018-02-19 13:31:51 25.75 2018-02-19 2018-02-20 1.0722222 hours 27.609722
2: 2018-02-19 13:31:51 2018-02-19 14:17:57 30.75 2018-02-19 2018-02-20 0.7683333 hours 23.626250
3: 2018-02-19 14:17:57 2018-02-19 14:23:21 25.75 2018-02-19 2018-02-20 0.0900000 hours 2.317500
4: 2018-02-19 14:23:21 2018-02-19 15:17:34 25.81 2018-02-19 2018-02-20 0.9036111 hours 23.322203
5: 2018-02-19 15:17:34 2018-02-19 15:29:40 25.81 2018-02-19 2018-02-20 0.2016667 hours 5.205017
---
131: 2018-03-12 07:32:16 2018-03-12 21:45:00 16.98 2018-03-12 2018-03-13 14.2122222 hours 241.323533
132: 2018-03-12 21:45:00 2018-03-12 23:30:58 17.98 2018-03-12 2018-03-13 1.7661111 hours 31.754678
133: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98 2018-03-12 2018-03-13 0.4838889 hours 8.216433
134: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98 2018-03-13 2018-03-14 6.2433333 hours 106.011800
135: 2018-03-13 06:14:36 2018-03-13 09:35:52 15.98 2018-03-13 2018-03-14 3.3544444 hours 53.604022

Related

Subsetting a gps-track dataset based on time intervals gathered from a second dataset

I have a large gps-track dataset and I want to extract only the positions taken while an observer was on duty. In other terms, I need to cut the gps-tracks in several transects in which an observer was watching. The watching periods are in a second DB in which the observer registered the start and end of (roughly hourly) watching periods, so that the start time and end time registered for each day marks the start and end of the watch period for that day in most cases. However, it can happen that the watching was paused for some reason and then restarted some time later on the same day, so that two consecutive annotations can have a time gap between them.
I was trying with match() and dplyr:filter() functions but can't came out with a solution. Any idea would be greatly appreciated.
Below it is a simplified example
DB1 (very large gps track to subset)
date time lat lon
1 18/04/2017 6:10 34.01 -53.07
2 18/04/2017 6:20 34.02 -53.09
3 18/04/2017 6:30 34.04 -53.10
4 18/04/2017 6:40 34.05 -53.11
5 18/04/2017 6:50 34.07 -53.13
6 18/04/2017 7:00 34.08 -53.14
7 18/04/2017 7:10 34.01 -53.07
8 18/04/2017 7:20 34.02 -53.09
9 18/04/2017 7:30 34.04 -53.10
. . . . .
. . . . .
. . . . .
n 19/04/2017 6:10 34.05 -53.11
n+1 19/04/2017 6:20 34.07 -53.13
n+2 19/04/2017 6:30 34.08 -53.14
DB2 (watching periods)
date start.watch end.watch
1 2017-04-18 05:00 06:10
2 2017-04-18 06:10 06:30
3 2017-04-18 06:30 06:45
4 2017-04-18 07:20 08:20
. . . .
. . . .
. . . .
n 2017-04-19 06:20 07:20
n+1 2017-04-19 07:20 08:40
Resulting DB should be:`
1 18/04/2017 6:10 34.01 -53.07
2 18/04/2017 6:20 34.02 -53.09
3 18/04/2017 6:30 34.04 -53.10
4 18/04/2017 6:40 34.05 -53.11
8 18/04/2017 7:20 34.02 -53.09
9 18/04/2017 7:30 34.04 -53.10
n 19/04/2017 6:10 34.05 -53.11
n+1 19/04/2017 6:20 34.07 -53.13
n+2 19/04/2017 6:30 34.08 -53.14
Here's an alternative that does a range-based (fuzzy) join based on time overlaps. It uses data.table::foverlaps, which does require (at least for this join) that the two frames be proper data.table objects, because it needs the keys to be clearly set.
This method has a few requirements:
All timestamps are easily comparable numerically, I'll convert them to POSIXt objects;
Keys are set for at least the second table (and might help in the first). The last two keys for each must be the beginning and end of each time interval; and
Yes, you read that right, even the "single time observations" need two timestamp fields.
NB: I use magrittr solely to break out the process into a pipeline of sorts; it is not at all required, just makes it easier to read. Also, I use copy() and setDT and then assign to a new variable primarily because (1) I iterated a few times but wanted to start with fresh data each time; and more importantly (2) because data.table operates in side-effect, I want to encourage you to try this but not kill your local data until you are comfortable working with it in side-effect. You can easily un-data.table-ize it after the fact.
First, I'll set up the needed conditions.
library(data.table)
library(magrittr)
DB1dt <- copy(DB1) %>%
setDT() %>%
.[, dt := as.POSIXct(paste(date, time), format = "%d/%m/%Y %H:%M") ] %>%
# remove unneeded columns
.[, c("date", "time") := NULL ] %>%
.[, dt2 := dt ] %>%
setkey(dt, dt2)
DB2dt <- copy(DB2) %>%
setDT() %>%
.[, startdt := as.POSIXct(paste(date, start.watch), format = "%Y-%m-%d %H:%M") ] %>%
.[, enddt := as.POSIXct(paste(date, end.watch), format = "%Y-%m-%d %H:%M") - 1e-5 ] %>%
# remove unneeded columns
.[, c("date", "start.watch", "end.watch") := NULL ] %>%
setkey(startdt, enddt)
DB1dt[1:2,]
# lat lon dt dt2
# 1: 34.01 -53.07 2017-04-18 06:10:00 2017-04-18 06:10:00
# 2: 34.02 -53.09 2017-04-18 06:20:00 2017-04-18 06:20:00
DB2dt[1:2,]
# startdt enddt
# 1: 2017-04-18 05:00:00 2017-04-18 06:09:59
# 2: 2017-04-18 06:10:00 2017-04-18 06:29:59
FYI: the use of -1e-5 is because the "within"-join is closed on both ends ([a,b], in constrast to open-right [a,b)), so equality on enddt would match. Over to you if you want to keep this.
From here, the overlapping join is simply:
foverlaps(DB1dt, DB2dt, type = "within", nomatch = NULL)
# startdt enddt lat lon dt dt2
# 1: 2017-04-18 06:10:00 2017-04-18 06:29:59 34.01 -53.07 2017-04-18 06:10:00 2017-04-18 06:10:00
# 2: 2017-04-18 06:10:00 2017-04-18 06:29:59 34.02 -53.09 2017-04-18 06:20:00 2017-04-18 06:20:00
# 3: 2017-04-18 06:30:00 2017-04-18 06:44:59 34.04 -53.10 2017-04-18 06:30:00 2017-04-18 06:30:00
# 4: 2017-04-18 06:30:00 2017-04-18 06:44:59 34.05 -53.11 2017-04-18 06:40:00 2017-04-18 06:40:00
# 5: 2017-04-18 07:20:00 2017-04-18 08:19:59 34.02 -53.09 2017-04-18 07:20:00 2017-04-18 07:20:00
# 6: 2017-04-18 07:20:00 2017-04-18 08:19:59 34.04 -53.10 2017-04-18 07:30:00 2017-04-18 07:30:00
# 7: 2017-04-19 06:20:00 2017-04-19 07:19:59 34.07 -53.13 2017-04-19 06:20:00 2017-04-19 06:20:00
# 8: 2017-04-19 06:20:00 2017-04-19 07:19:59 34.08 -53.14 2017-04-19 06:30:00 2017-04-19 06:30:00
Sample data:
DB1 <- read.table(stringsAsFactors = FALSE, header = TRUE, text = "
date time lat lon
18/04/2017 6:10 34.01 -53.07
18/04/2017 6:20 34.02 -53.09
18/04/2017 6:30 34.04 -53.10
18/04/2017 6:40 34.05 -53.11
18/04/2017 6:50 34.07 -53.13
18/04/2017 7:00 34.08 -53.14
18/04/2017 7:10 34.01 -53.07
18/04/2017 7:20 34.02 -53.09
18/04/2017 7:30 34.04 -53.10
19/04/2017 6:10 34.05 -53.11
19/04/2017 6:20 34.07 -53.13
19/04/2017 6:30 34.08 -53.14")
DB2 <- read.table(stringsAsFactors = FALSE, header = TRUE, text = "
date start.watch end.watch
2017-04-18 05:00 06:10
2017-04-18 06:10 06:30
2017-04-18 06:30 06:45
2017-04-18 07:20 08:20
2017-04-19 06:20 07:20
2017-04-19 07:20 08:40")
Related reading:
https://codereview.stackexchange.com/q/224705
https://github.com/Rdatatable/data.table/issues/3721
Here is, I think, the solution to your question.
The code should be clear, but in brief, the key part is to create datetime columns and intervals with the lubridate package, and then use lubridate's %within% function to check if a given time is inside the given intervals.
Hope this helps.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
db1 <- tribble(~date, ~time, ~lat, ~lon,
"18/04/2017", "6:10", 34.01, -53.07,
"18/04/2017", "6:20", 34.02, -53.09,
"18/04/2017", "6:30", 34.04, -53.10,
"18/04/2017", "6:40", 34.05, -53.11,
"18/04/2017", "6:50", 34.07, -53.13,
"18/04/2017", "7:00", 34.08, -53.14,
"18/04/2017", "7:10", 34.01, -53.07,
"18/04/2017", "7:20", 34.02, -53.09,
"18/04/2017", "7:30", 34.04, -53.10
)
db2 <- tribble(~date, ~start.watch, ~end.watch,
"2017-04-18", "05:00", "06:10",
"2017-04-18", "06:10", "06:30",
"2017-04-18", "06:30", "06:45",
"2017-04-18", "07:20", "08:20")
db2_intervals <- db2 %>%
mutate(end_date = date) %>%
unite("start_datetime", date, start.watch) %>%
unite("end_datetime", end_date, end.watch) %>%
transmute(interval = interval(start = ymd_hm(start_datetime),
end = ymd_hm(end_datetime))) %>%
pull(interval)
db1 %>%
unite("datetime", date, time) %>%
mutate(datetime = lubridate::dmy_hm(datetime)) %>%
filter(datetime %within% as.list(db2_intervals))
#> # A tibble: 6 x 3
#> datetime lat lon
#> <dttm> <dbl> <dbl>
#> 1 2017-04-18 06:10:00 34.0 -53.1
#> 2 2017-04-18 06:20:00 34.0 -53.1
#> 3 2017-04-18 06:30:00 34.0 -53.1
#> 4 2017-04-18 06:40:00 34.0 -53.1
#> 5 2017-04-18 07:20:00 34.0 -53.1
#> 6 2017-04-18 07:30:00 34.0 -53.1
Created on 2019-10-09 by the reprex package (v0.3.0)

How do I extract data from a data frame based on the months?

I have a data frame, df, that has date and two variables in it. I would like to either extract all of Oct-Dec data or delete the other months data from the data frame.
I have put the data into a data frame but at the moment have the whole year, I just want to extract the wanted data. In future I will also be extracting just winter data. I have attached my chunk of my data frame, I tried using format() with just %m but couldn't get it to work.
14138 2017-09-15 4.655946e-01 0.0603515884
14139 2017-09-16 7.881137e-01 0.0479933304
14140 2017-09-17 5.018990e-01 0.0256871025
14141 2017-09-18 -1.583625e-01 -0.0040893990
14142 2017-09-19 -6.733220e-01 -0.0313100989
14143 2017-09-20 -1.225730e+00 -0.0587706331
14144 2017-09-21 -1.419133e+00 -0.0958125544
14145 2017-09-22 -1.338630e+00 -0.0902803173
14146 2017-09-23 -1.272554e+00 -0.0659170673
14147 2017-09-24 -1.132318e+00 -0.0387240370
14148 2017-09-25 -1.255414e+00 -0.0392615823
14149 2017-09-26 -1.497188e+00 -0.0438491356
14150 2017-09-27 -1.427622e+00 -0.0633879185
14151 2017-09-28 -1.051756e+00 -0.0992427127
14152 2017-09-29 -4.876309e-01 -0.1448044528
14153 2017-09-30 -6.829681e-02 -0.1749463647
14154 2017-10-01 -1.413768e-01 -0.2009916094
14155 2017-10-02 6.359742e-02 -0.1975848313
14156 2017-10-03 9.103277e-01 -0.1828581805
14157 2017-10-04 1.695776e+00 -0.1589352546
14158 2017-10-05 1.913918e+00 -0.1538234614
14159 2017-10-06 1.479714e+00 -0.1937094170
14160 2017-10-07 8.783669e-01 -0.1703790211
14161 2017-10-08 5.706581e-01 -0.1294144428
14162 2017-10-09 4.979405e-01 -0.0666569815
14163 2017-10-10 3.233477e-01 0.0072006102
14164 2017-10-11 3.057630e-01 0.0863445067
14165 2017-10-12 5.877673e-01 0.1097707831
14166 2017-10-13 1.208526e+00 0.1301967193
14167 2017-10-14 1.671705e+00 0.1728109268
14168 2017-10-15 1.810979e+00 0.2264911145
14169 2017-10-16 1.426651e+00 0.2702958315
14170 2017-10-17 1.241140e+00 0.3242637704
14171 2017-10-18 8.997498e-01 0.3879727861
14172 2017-10-19 5.594161e-01 0.4172990825
14173 2017-10-20 3.980254e-01 0.3915170864
14174 2017-10-21 2.138538e-01 0.3249736995
14175 2017-10-22 3.926440e-01 0.2224834840
14176 2017-10-23 2.268644e-01 0.0529143372
14177 2017-10-24 5.664923e-01 -0.0081443464
14178 2017-10-25 6.167520e-01 0.0312073984
14179 2017-10-26 7.751882e-02 0.0043897693
14180 2017-10-27 -5.634851e-02 -0.0726825266
14181 2017-10-28 -2.122061e-01 -0.1711305549
14182 2017-10-29 -8.500991e-01 -0.2068581639
14183 2017-10-30 -1.039685e+00 -0.2909120824
14184 2017-10-31 -3.057745e-01 -0.3933633317
14185 2017-11-01 -1.288774e-01 -0.3726346136
14186 2017-11-02 -5.608007e-03 -0.2425754386
14187 2017-11-03 4.853990e-01 -0.0503543980
14188 2017-11-04 5.822672e-01 0.0896130098
14189 2017-11-05 8.491505e-01 0.1299151006
14190 2017-11-06 1.052999e+00 0.0749888307
14191 2017-11-07 1.170470e+00 0.0287317882
14192 2017-11-08 7.919862e-01 0.0788187381
14193 2017-11-09 4.574565e-01 0.1539981316
14194 2017-11-10 4.552032e-01 0.2034393145
14195 2017-11-11 -3.621350e-01 0.2077476707
14196 2017-11-12 -8.053965e-01 0.1759558604
14197 2017-11-13 -8.307459e-01 0.1802858410
14198 2017-11-14 -9.421325e-01 0.2175529008
14199 2017-11-15 -9.880204e-01 0.2392924580
14200 2017-11-16 -7.448127e-01 0.2519253751
14201 2017-11-17 -8.081435e-01 0.2614254732
14202 2017-11-18 -1.216806e+00 0.2629971336
14203 2017-11-19 -1.122674e+00 0.3469995055
14204 2017-11-20 -1.242597e+00 0.4553094014
14205 2017-11-21 -1.294885e+00 0.5049438231
14206 2017-11-22 -9.325514e-01 0.4684133163
14207 2017-11-23 -4.632281e-01 0.4071673624
14208 2017-11-24 -9.689322e-02 0.3710270269
14209 2017-11-25 4.704467e-01 0.4126721465
14210 2017-11-26 8.682453e-01 0.3745057653
14211 2017-11-27 5.105564e-01 0.2373454931
14212 2017-11-28 4.747265e-01 0.1650783370
14213 2017-11-29 5.905379e-01 0.2632154120
14214 2017-11-30 4.083787e-01 0.3888834762
14215 2017-12-01 3.451736e-01 0.5008047592
14216 2017-12-02 5.161312e-01 0.5388177242
14217 2017-12-03 7.109279e-01 0.5515360710
14218 2017-12-04 4.458635e-01 0.5127537202
14219 2017-12-05 -3.986610e-01 0.3896493238
14220 2017-12-06 -5.968253e-01 0.1095843268
14221 2017-12-07 -1.604398e-01 -0.2455506506
14222 2017-12-08 -4.384744e-01 -0.5801038215
14223 2017-12-09 -7.255016e-01 -0.8384627087
14224 2017-12-10 -9.691828e-01 -0.9223171538
14225 2017-12-11 -1.140588e+00 -0.8177806761
14226 2017-12-12 -1.956622e-01 -0.5250998474
14227 2017-12-13 -1.083792e-01 -0.3430768534
14228 2017-12-14 -8.016345e-02 -0.3163476104
14229 2017-12-15 8.899266e-01 -0.2813253830
14230 2017-12-16 1.322833e+00 -0.2545953062
14231 2017-12-17 1.547972e+00 -0.2275373110
14232 2017-12-18 2.164907e+00 -0.3217205817
14233 2017-12-19 2.276258e+00 -0.5773412429
14234 2017-12-20 1.862291e+00 -0.7728091393
14235 2017-12-21 1.125083e+00 -0.9099696881
14236 2017-12-22 7.737118e-01 -1.2441963604
14237 2017-12-23 7.863508e-01 -1.4802661587
14238 2017-12-24 4.313111e-01 -1.4111320559
14239 2017-12-25 -8.814799e-02 -1.0024805520
14240 2017-12-26 -3.615127e-01 -0.4943077147
14241 2017-12-27 -5.011363e-01 -0.0308588186
14242 2017-12-28 -8.474088e-01 0.3717555895
14243 2017-12-29 -7.283247e-01 0.8230450219
14244 2017-12-30 -4.566981e-01 1.2495961116
14245 2017-12-31 -4.577034e-01 1.4805369230
14246 2018-01-01 1.946166e-01 1.5310004017
14247 2018-01-02 5.203149e-01 1.5384595802
14248 2018-01-03 5.024570e-02 1.4036679018
14249 2018-01-04 -7.065297e-01 1.0749574137
14250 2018-01-05 -8.741815e-01 0.7608524752
14251 2018-01-06 1.589530e-01 0.7891084646
14252 2018-01-07 8.632378e-01 1.1230358751
As requested, the class is "Date".
You can use lubridate and base R:
library(lubridate)
dats[month(ymd(dats$V2)) >= 10,]
# EDIT if the class of the date variable is date, it should be only
dats[month(dats$V2) >= 10,]
Or fully base without any date work:
dats[substr(dats$V2,6,7) %in% c("10","11","12"),]
With data:
V1 V2 V3 V4
1 14138 2017-09-15 0.4655946 0.06035159
2 14139 2017-09-16 0.7881137 0.04799333
...
From your question, it is unclear what format the date variable is in. Maybe add the output of class(your_date_variable) to the question. As a general rule, though, you'll want to use filter from the dplyr package. Something like this:
new_data <- data %>% filter(format(date_variable, "%m") >= 10)
This might change slightly depending on the class of your date variable.
Assuming the 'date_variable' is Date class, extract the month and do a comparison in filter (action verb from dplyr)
library(dplyr)
library(lubridate)
data %>%
filter(month(date_variable) >= 10)

How to plot lagged data against other data in R

I would like to lag one variable by, say, 10 time steps and plot it against the other variable which remains the same. I would like to do this for various lags to see if there is a time period that the first variable influences the other. The data I have is daily and after lagging I am separating into Dec-Feb data only. The problem I am having is the plot and correlation between the lagged variable and the other data is coming out the same as the non-lagged plot and correlation every time. I am not sure how to achieve this.
A sample of my data frame "data" can be seen below.
Date x y
14158 2017-10-05 1.913918e+00 -0.1538234614
14159 2017-10-06 1.479714e+00 -0.1937094170
14160 2017-10-07 8.783669e-01 -0.1703790211
14161 2017-10-08 5.706581e-01 -0.1294144428
14162 2017-10-09 4.979405e-01 -0.0666569815
14163 2017-10-10 3.233477e-01 0.0072006102
14164 2017-10-11 3.057630e-01 0.0863445067
14165 2017-10-12 5.877673e-01 0.1097707831
14166 2017-10-13 1.208526e+00 0.1301967193
14167 2017-10-14 1.671705e+00 0.1728109268
14168 2017-10-15 1.810979e+00 0.2264911145
14169 2017-10-16 1.426651e+00 0.2702958315
14170 2017-10-17 1.241140e+00 0.3242637704
14171 2017-10-18 8.997498e-01 0.3879727861
14172 2017-10-19 5.594161e-01 0.4172990825
14173 2017-10-20 3.980254e-01 0.3915170864
14174 2017-10-21 2.138538e-01 0.3249736995
14175 2017-10-22 3.926440e-01 0.2224834840
14176 2017-10-23 2.268644e-01 0.0529143372
14177 2017-10-24 5.664923e-01 -0.0081443464
14178 2017-10-25 6.167520e-01 0.0312073984
14179 2017-10-26 7.751882e-02 0.0043897693
14180 2017-10-27 -5.634851e-02 -0.0726825266
14181 2017-10-28 -2.122061e-01 -0.1711305549
14182 2017-10-29 -8.500991e-01 -0.2068581639
14183 2017-10-30 -1.039685e+00 -0.2909120824
14184 2017-10-31 -3.057745e-01 -0.3933633317
14185 2017-11-01 -1.288774e-01 -0.3726346136
14186 2017-11-02 -5.608007e-03 -0.2425754386
14187 2017-11-03 4.853990e-01 -0.0503543980
14188 2017-11-04 5.822672e-01 0.0896130098
14189 2017-11-05 8.491505e-01 0.1299151006
14190 2017-11-06 1.052999e+00 0.0749888307
14191 2017-11-07 1.170470e+00 0.0287317882
14192 2017-11-08 7.919862e-01 0.0788187381
14193 2017-11-09 4.574565e-01 0.1539981316
14194 2017-11-10 4.552032e-01 0.2034393145
14195 2017-11-11 -3.621350e-01 0.2077476707
14196 2017-11-12 -8.053965e-01 0.1759558604
14197 2017-11-13 -8.307459e-01 0.1802858410
14198 2017-11-14 -9.421325e-01 0.2175529008
14199 2017-11-15 -9.880204e-01 0.2392924580
14200 2017-11-16 -7.448127e-01 0.2519253751
14201 2017-11-17 -8.081435e-01 0.2614254732
14202 2017-11-18 -1.216806e+00 0.2629971336
14203 2017-11-19 -1.122674e+00 0.3469995055
14204 2017-11-20 -1.242597e+00 0.4553094014
14205 2017-11-21 -1.294885e+00 0.5049438231
14206 2017-11-22 -9.325514e-01 0.4684133163
14207 2017-11-23 -4.632281e-01 0.4071673624
14208 2017-11-24 -9.689322e-02 0.3710270269
14209 2017-11-25 4.704467e-01 0.4126721465
14210 2017-11-26 8.682453e-01 0.3745057653
14211 2017-11-27 5.105564e-01 0.2373454931
14212 2017-11-28 4.747265e-01 0.1650783370
14213 2017-11-29 5.905379e-01 0.2632154120
14214 2017-11-30 4.083787e-01 0.3888834762
14215 2017-12-01 3.451736e-01 0.5008047592
14216 2017-12-02 5.161312e-01 0.5388177242
14217 2017-12-03 7.109279e-01 0.5515360710
14218 2017-12-04 4.458635e-01 0.5127537202
14219 2017-12-05 -3.986610e-01 0.3896493238
14220 2017-12-06 -5.968253e-01 0.1095843268
14221 2017-12-07 -1.604398e-01 -0.2455506506
14222 2017-12-08 -4.384744e-01 -0.5801038215
14223 2017-12-09 -7.255016e-01 -0.8384627087
14224 2017-12-10 -9.691828e-01 -0.9223171538
14225 2017-12-11 -1.140588e+00 -0.8177806761
14226 2017-12-12 -1.956622e-01 -0.5250998474
14227 2017-12-13 -1.083792e-01 -0.3430768534
14228 2017-12-14 -8.016345e-02 -0.3163476104
14229 2017-12-15 8.899266e-01 -0.2813253830
14230 2017-12-16 1.322833e+00 -0.2545953062
14231 2017-12-17 1.547972e+00 -0.2275373110
14232 2017-12-18 2.164907e+00 -0.3217205817
14233 2017-12-19 2.276258e+00 -0.5773412429
14234 2017-12-20 1.862291e+00 -0.7728091393
14235 2017-12-21 1.125083e+00 -0.9099696881
14236 2017-12-22 7.737118e-01 -1.2441963604
14237 2017-12-23 7.863508e-01 -1.4802661587
14238 2017-12-24 4.313111e-01 -1.4111320559
14239 2017-12-25 -8.814799e-02 -1.0024805520
14240 2017-12-26 -3.615127e-01 -0.4943077147
14241 2017-12-27 -5.011363e-01 -0.0308588186
14242 2017-12-28 -8.474088e-01 0.3717555895
14243 2017-12-29 -7.283247e-01 0.8230450219
14244 2017-12-30 -4.566981e-01 1.2495961116
14245 2017-12-31 -4.577034e-01 1.4805369230
14246 2018-01-01 1.946166e-01 1.5310004017
14247 2018-01-02 5.203149e-01 1.5384595802
14248 2018-01-03 5.024570e-02 1.4036679018
14249 2018-01-04 -7.065297e-01 1.0749574137
14250 2018-01-05 -8.741815e-01 0.7608524752
14251 2018-01-06 1.589530e-01 0.7891084646
14252 2018-01-07 8.632378e-01 1.1230358751
I am using
lagged <- lag(ts(x), k=10)
This is so the tsp isn't ignored. However, when I do
cor(data$x, data$y)
and
cor(lagged, data$y)
the result is the same, where I would have thought it would have been different. How do I get this lag to work before I can go ahead separate via date?
Many thanks!

How to confront error "wrong embedding dimension" in cajolst R function?

When I try to use cajolst function from urca package I get a strange error.
would you please guide me how can i confront the problem?
result<-urca::cajolst(data ,trend = FALSE, K = 2, season = NULL)
Error in embed(diff(x), K) : wrong embedding dimension.
dates A G
2016-11-30 0 0
2016-12-01 -3.53 3.198
2016-12-02 -2.832 8.703
2016-12-04 -2.666 7.799
2016-12-05 -0.54 7.701
2016-12-06 -1.296 4.685
2016-12-07 -1.785 -4.587
2016-12-08 -6.834 -3.696
2016-12-09 -9.624 -5.461
2016-12-11 -11.374 -0.423
2016-12-12 -6.037 -1.614
2016-12-13 -5.934 -3.231
2016-12-14 -7.279 1.072
2016-12-15 -7.859 -4.823
2016-12-16 -15.132 10.838
2016-12-19 -15.345 11.5
2016-12-20 -15.673 6.639
2016-12-21 -15.391 11.162
2016-12-22 -14.357 7.032
2016-12-23 -14.99 12.355
2016-12-26 -15.626 10.944
2016-12-27 -12.297 10.215
2016-12-28 -13.967 5.957
2016-12-29 -12.946 3.446
2016-12-30 -19.681 10.274
2017-01-02 -18.24 8.781
2017-01-03 -16.83 1.116
2017-01-04 -18.189 -0.036
2017-01-05 -15.897 -1.441
2017-01-06 -20.196 -8.534
2017-01-09 -14.57 -28.768
2017-01-10 -13.27 -29.821
2017-01-11 -8.85 -38.881
2017-01-12 -6.375 -50.885
2017-01-13 -8.056 -51.321
2017-01-16 -5.217 -63.619
2017-01-17 -4.75 -39.163
2017-01-18 3.505 -46.309
2017-01-19 10.939 -45.825
2017-01-20 9.248 -42.973
2017-01-23 9.532 -33.396
2017-01-24 4.235 -31.38
2017-01-25 -1.885 -19.21
2017-01-26 -5.027 -15.74
2017-01-27 0.015 -23.029
2017-01-30 -0.685 -30.773
2017-01-31 -2.692 -25.544
2017-02-01 -2.654 -17.912
2017-02-02 4.002 -43.309
2017-02-03 4.813 -52.627
2017-02-06 7.049 -49.965
2017-02-07 10.003 -40.568
2017-02-08 8.996 -39.828
2017-02-09 7.047 -41.19
2017-02-10 7.656 -50.853
2017-02-13 4.986 -41.318
2017-02-14 8.493 -51.946
2017-02-15 12.547 -59.538
2017-02-16 10.327 -54.496
2017-02-17 7.09 -57.571
2017-02-20 11.633 -54.91
2017-02-21 12.664 -51.597
2017-02-22 16.103 -57.819
2017-02-23 14.25 -51.336
2017-02-24 7.794 -54.898
2017-02-27 15.27 -55.754
2017-02-28 19.984 -58.37
2017-03-01 23.899 -70.73
2017-03-02 16.63 -56.29
2017-03-03 16.443 -55.858
2017-03-06 17.901 -59.377
2017-03-07 19.067 -64.383
2017-03-08 17.219 -57.829
2017-03-09 15.694 -55.022
2017-03-10 17.351 -60.431
2017-03-13 18.945 -59.79
2017-03-14 20.001 -64.848
2017-03-15 23.852 -73.806
2017-03-16 22.697 -64.191
2017-03-17 26.892 -65.328
2017-03-20 29.221 -72.764
2017-03-21 25.165 -53.427
2017-03-22 22.998 -51.676
2017-03-23 20.072 -40.57
2017-03-24 20.758 -43.654
2017-03-27 20.062 -33.672
2017-03-28 22.066 -47.184
2017-03-29 22.363 -54.57
2017-03-30 20.684 -48.199
2017-03-31 17.056 -40.887
2017-04-03 19.12 -39.618
2017-04-04 16.359 -37.1
2017-04-05 18.643 -32.734
2017-04-06 14.708 -30.455
2017-04-07 8.403 -33.553
2017-04-10 6.072 -29.048
2017-04-11 5.186 -20.696
2017-04-12 4.248 -20.924
2017-04-13 12.803 -31.075
2017-04-14 12.566 -29.768
2017-04-17 14.065 -28.906
2017-04-18 14.5 4.121
2017-04-19 13.865 8.835
2017-04-20 16.126 6.191
2017-04-21 17.591 3.77
2017-04-24 22.3 -2.497
2017-04-25 22.731 7.408
2017-04-26 19.146 18.45
2017-04-27 19.052 25.541
2017-04-28 21.889 26.878
2017-05-01 27.323 14.362
2017-05-02 29.93 17.525
2017-05-03 19.835 29.856
2017-05-04 19.683 36.72
2017-05-05 13.545 41.055
2017-05-08 14.165 43.544
2017-05-09 11.325 49.978
2017-05-10 10.143 47.072
2017-05-11 13.718 38.901
2017-05-12 14.216 36.017
2017-05-15 13.701 33.797
2017-05-16 13.505 33.867
2017-05-17 13.456 38.004
2017-05-18 12.613 37.758
2017-05-19 11.166 40.367
2017-05-22 12.221 34.022
2017-05-23 13.682 29.793
2017-05-24 10.05 26.701
2017-05-25 10.122 31.394
2017-05-26 7.592 20.073
2017-05-29 6.796 23.809
2017-05-30 9.638 16.1
2017-05-31 7.983 29.043
2017-06-01 3.594 39.557
2017-06-02 8.763 27.863
2017-06-05 12.157 22.397
2017-06-06 13.383 19.053
2017-06-07 20.52 17.449
2017-06-08 19.534 -1.615
2017-06-09 16.011 -1.989
2017-06-12 9.153 -9.294
2017-06-13 4.295 -0.897
2017-06-14 9.743 -9.818
2017-06-15 10.386 -8.255
2017-06-16 11.983 -12.522
2017-06-19 9.513 -12.931
2017-06-20 10.298 -21.024
2017-06-21 11.087 -11.801
2017-06-22 4.472 -9.048
2017-06-23 9.416 -9.592
2017-06-26 9.686 -12.006
2017-06-27 6.424 -2.632
2017-06-28 3.062 -1.016
2017-06-29 5.593 -0.825
2017-06-30 3.531 0.914
2017-07-03 3.208 -2.596
2017-07-04 -6.373 4.289
2017-07-05 -5.149 5.917
2017-07-06 -6.104 12.75
2017-07-07 -9.565 1.615
2017-07-10 -8.961 -0.053
2017-07-11 -4.065 -8.541
2017-07-12 -10.133 -11.286
2017-07-13 -6.223 -15.181
2017-07-14 -1.524 -14.396
2017-07-17 -1.613 -14.61
2017-07-18 5.781 -35.473
2017-07-19 8.243 -44.186
2017-07-20 7.665 -49.857
2017-07-21 0.485 -41.286
2017-07-24 -0.638 -39.127
2017-07-25 0.767 -40.952
2017-07-26 3.566 -44.388
2017-07-27 6.834 -42.543
2017-07-28 1.306 -37.657
2017-07-31 5.839 -34.048
2017-08-01 5.838 -28.939
2017-08-02 7.298 -26.566
2017-08-03 6.804 -32.876
2017-08-04 8.989 -38.618
2017-08-07 8.862 -36.676
2017-08-08 8.234 -40.893
2017-08-09 7.39 -35.16
2017-08-10 8.593 -35.555
2017-08-11 7.253 -35.175
2017-08-14 5.593 -33.644
2017-08-15 4.528 -37.82
2017-08-16 6.752 -53.217
2017-08-17 6.284 -49.252
2017-08-18 4.765 -55.602
2017-08-21 3.905 -54.32
2017-08-22 1.76 -57.853
2017-08-23 0.406 -58.925
2017-08-24 -2.438 -58.098
2017-08-25 -0.791 -56.682
2017-08-28 2.173 -51.278
2017-08-29 2.523 -54.353
2017-08-30 4.482 -46.325
2017-08-31 0.246 -52.567
2017-09-01 -4.214 -53.636
2017-09-04 -4.548 -52.735
2017-09-05 -1.781 -50.421
2017-09-06 -10.463 -51.122
2017-09-07 -13.119 -52.433
2017-09-08 -11.716 -43.493
2017-09-11 -16.15 -43.142
2017-09-12 -12.478 -29.335
2017-09-13 -16.457 -31.697
2017-09-14 -14.615 -15.13
2017-09-15 -13.911 3.023
One of the issue is that the 'Date' column is also included and secondly, the season is not needed, it can be FALSE or specify an integer value
library(urca)
out <- cajolst(data[-1] ,trend = FALSE, K = 2, season =FALSE)
If there is a season effect and it is `quarterly, the value would be 4
out1 <- cajolst(data[-1] ,trend = FALSE, K = 2, season = 4)
out1
#####################################################
# Johansen-Procedure Unit Root / Cointegration Test #
#####################################################
#The value of the test statistic is: 3.6212 13.2233
data
data <- structure(list(dates = c("2016-11-30", "2016-12-01", "2016-12-02",
"2016-12-04", "2016-12-05", "2016-12-06", "2016-12-07", "2016-12-08",
"2016-12-09", "2016-12-11", "2016-12-12", "2016-12-13", "2016-12-14",
"2016-12-15", "2016-12-16", "2016-12-19", "2016-12-20", "2016-12-21",
"2016-12-22", "2016-12-23", "2016-12-26", "2016-12-27", "2016-12-28",
"2016-12-29", "2016-12-30", "2017-01-02", "2017-01-03", "2017-01-04",
"2017-01-05", "2017-01-06", "2017-01-09", "2017-01-10", "2017-01-11",
"2017-01-12", "2017-01-13", "2017-01-16", "2017-01-17", "2017-01-18",
"2017-01-19", "2017-01-20", "2017-01-23", "2017-01-24", "2017-01-25",
"2017-01-26", "2017-01-27", "2017-01-30", "2017-01-31", "2017-02-01",
"2017-02-02", "2017-02-03", "2017-02-06", "2017-02-07", "2017-02-08",
"2017-02-09", "2017-02-10", "2017-02-13", "2017-02-14", "2017-02-15",
"2017-02-16", "2017-02-17", "2017-02-20", "2017-02-21", "2017-02-22",
"2017-02-23", "2017-02-24", "2017-02-27", "2017-02-28", "2017-03-01",
"2017-03-02", "2017-03-03", "2017-03-06", "2017-03-07", "2017-03-08",
"2017-03-09", "2017-03-10", "2017-03-13", "2017-03-14", "2017-03-15",
"2017-03-16", "2017-03-17", "2017-03-20", "2017-03-21", "2017-03-22",
"2017-03-23", "2017-03-24", "2017-03-27", "2017-03-28", "2017-03-29",
"2017-03-30", "2017-03-31", "2017-04-03", "2017-04-04", "2017-04-05",
"2017-04-06", "2017-04-07", "2017-04-10", "2017-04-11", "2017-04-12",
"2017-04-13", "2017-04-14", "2017-04-17", "2017-04-18", "2017-04-19",
"2017-04-20", "2017-04-21", "2017-04-24", "2017-04-25", "2017-04-26",
"2017-04-27", "2017-04-28", "2017-05-01", "2017-05-02", "2017-05-03",
"2017-05-04", "2017-05-05", "2017-05-08", "2017-05-09", "2017-05-10",
"2017-05-11", "2017-05-12", "2017-05-15", "2017-05-16", "2017-05-17",
"2017-05-18", "2017-05-19", "2017-05-22", "2017-05-23", "2017-05-24",
"2017-05-25", "2017-05-26", "2017-05-29", "2017-05-30", "2017-05-31",
"2017-06-01", "2017-06-02", "2017-06-05", "2017-06-06", "2017-06-07",
"2017-06-08", "2017-06-09", "2017-06-12", "2017-06-13", "2017-06-14",
"2017-06-15", "2017-06-16", "2017-06-19", "2017-06-20", "2017-06-21",
"2017-06-22", "2017-06-23", "2017-06-26", "2017-06-27", "2017-06-28",
"2017-06-29", "2017-06-30", "2017-07-03", "2017-07-04", "2017-07-05",
"2017-07-06", "2017-07-07", "2017-07-10", "2017-07-11", "2017-07-12",
"2017-07-13", "2017-07-14", "2017-07-17", "2017-07-18", "2017-07-19",
"2017-07-20", "2017-07-21", "2017-07-24", "2017-07-25", "2017-07-26",
"2017-07-27", "2017-07-28", "2017-07-31", "2017-08-01", "2017-08-02",
"2017-08-03", "2017-08-04", "2017-08-07", "2017-08-08", "2017-08-09",
"2017-08-10", "2017-08-11", "2017-08-14", "2017-08-15", "2017-08-16",
"2017-08-17", "2017-08-18", "2017-08-21", "2017-08-22", "2017-08-23",
"2017-08-24", "2017-08-25", "2017-08-28", "2017-08-29", "2017-08-30",
"2017-08-31", "2017-09-01", "2017-09-04", "2017-09-05", "2017-09-06",
"2017-09-07", "2017-09-08", "2017-09-11", "2017-09-12", "2017-09-13",
"2017-09-14", "2017-09-15"), A = c(0, -3.53, -2.832, -2.666,
-0.54, -1.296, -1.785, -6.834, -9.624, -11.374, -6.037, -5.934,
-7.279, -7.859, -15.132, -15.345, -15.673, -15.391, -14.357,
-14.99, -15.626, -12.297, -13.967, -12.946, -19.681, -18.24,
-16.83, -18.189, -15.897, -20.196, -14.57, -13.27, -8.85, -6.375,
-8.056, -5.217, -4.75, 3.505, 10.939, 9.248, 9.532, 4.235, -1.885,
-5.027, 0.015, -0.685, -2.692, -2.654, 4.002, 4.813, 7.049, 10.003,
8.996, 7.047, 7.656, 4.986, 8.493, 12.547, 10.327, 7.09, 11.633,
12.664, 16.103, 14.25, 7.794, 15.27, 19.984, 23.899, 16.63, 16.443,
17.901, 19.067, 17.219, 15.694, 17.351, 18.945, 20.001, 23.852,
22.697, 26.892, 29.221, 25.165, 22.998, 20.072, 20.758, 20.062,
22.066, 22.363, 20.684, 17.056, 19.12, 16.359, 18.643, 14.708,
8.403, 6.072, 5.186, 4.248, 12.803, 12.566, 14.065, 14.5, 13.865,
16.126, 17.591, 22.3, 22.731, 19.146, 19.052, 21.889, 27.323,
29.93, 19.835, 19.683, 13.545, 14.165, 11.325, 10.143, 13.718,
14.216, 13.701, 13.505, 13.456, 12.613, 11.166, 12.221, 13.682,
10.05, 10.122, 7.592, 6.796, 9.638, 7.983, 3.594, 8.763, 12.157,
13.383, 20.52, 19.534, 16.011, 9.153, 4.295, 9.743, 10.386, 11.983,
9.513, 10.298, 11.087, 4.472, 9.416, 9.686, 6.424, 3.062, 5.593,
3.531, 3.208, -6.373, -5.149, -6.104, -9.565, -8.961, -4.065,
-10.133, -6.223, -1.524, -1.613, 5.781, 8.243, 7.665, 0.485,
-0.638, 0.767, 3.566, 6.834, 1.306, 5.839, 5.838, 7.298, 6.804,
8.989, 8.862, 8.234, 7.39, 8.593, 7.253, 5.593, 4.528, 6.752,
6.284, 4.765, 3.905, 1.76, 0.406, -2.438, -0.791, 2.173, 2.523,
4.482, 0.246, -4.214, -4.548, -1.781, -10.463, -13.119, -11.716,
-16.15, -12.478, -16.457, -14.615, -13.911), G = c(0, 3.198,
8.703, 7.799, 7.701, 4.685, -4.587, -3.696, -5.461, -0.423, -1.614,
-3.231, 1.072, -4.823, 10.838, 11.5, 6.639, 11.162, 7.032, 12.355,
10.944, 10.215, 5.957, 3.446, 10.274, 8.781, 1.116, -0.036, -1.441,
-8.534, -28.768, -29.821, -38.881, -50.885, -51.321, -63.619,
-39.163, -46.309, -45.825, -42.973, -33.396, -31.38, -19.21,
-15.74, -23.029, -30.773, -25.544, -17.912, -43.309, -52.627,
-49.965, -40.568, -39.828, -41.19, -50.853, -41.318, -51.946,
-59.538, -54.496, -57.571, -54.91, -51.597, -57.819, -51.336,
-54.898, -55.754, -58.37, -70.73, -56.29, -55.858, -59.377, -64.383,
-57.829, -55.022, -60.431, -59.79, -64.848, -73.806, -64.191,
-65.328, -72.764, -53.427, -51.676, -40.57, -43.654, -33.672,
-47.184, -54.57, -48.199, -40.887, -39.618, -37.1, -32.734, -30.455,
-33.553, -29.048, -20.696, -20.924, -31.075, -29.768, -28.906,
4.121, 8.835, 6.191, 3.77, -2.497, 7.408, 18.45, 25.541, 26.878,
14.362, 17.525, 29.856, 36.72, 41.055, 43.544, 49.978, 47.072,
38.901, 36.017, 33.797, 33.867, 38.004, 37.758, 40.367, 34.022,
29.793, 26.701, 31.394, 20.073, 23.809, 16.1, 29.043, 39.557,
27.863, 22.397, 19.053, 17.449, -1.615, -1.989, -9.294, -0.897,
-9.818, -8.255, -12.522, -12.931, -21.024, -11.801, -9.048, -9.592,
-12.006, -2.632, -1.016, -0.825, 0.914, -2.596, 4.289, 5.917,
12.75, 1.615, -0.053, -8.541, -11.286, -15.181, -14.396, -14.61,
-35.473, -44.186, -49.857, -41.286, -39.127, -40.952, -44.388,
-42.543, -37.657, -34.048, -28.939, -26.566, -32.876, -38.618,
-36.676, -40.893, -35.16, -35.555, -35.175, -33.644, -37.82,
-53.217, -49.252, -55.602, -54.32, -57.853, -58.925, -58.098,
-56.682, -51.278, -54.353, -46.325, -52.567, -53.636, -52.735,
-50.421, -51.122, -52.433, -43.493, -43.142, -29.335, -31.697,
-15.13, 3.023)), class = "data.frame", row.names = c(NA, -210L
))

How to convert data captured every 10 min interval into 15 min interval data

I have a dataframe with below data ( Average of the values of timestamp 7.50 and 7.40 should be my value of A for time Stamp 7.45)
Date_Time | A
7/28/2017 8:00| 443.75
7/28/2017 7:50| 440.75
7/28/2017 7:45| NA
7/28/2017 7:40| 447.5
7/28/2017 7:30| 448.75
7/28/2017 7:20| 444.5
7/28/2017 7:15| NA
7/28/2017 7:10| 440.25
7/28/2017 7:00| 447.5
I want it to transform into 15 min interval something like below using mean:
Date / Time | Object Value
7/28/2017 8:00| 465
7/28/2017 7:45| 464.875
7/28/2017 7:30| 464.75
7/28/2017 7:15| 464.875
7/28/2017 7:00| 465
Updat
The OP changes his or her desired output. Since I have no time to update my answer, I will leave my answer as it is. See my comment in the original post to see how to use na.interpolation to fill in the missing values.
Original Post
This solution assumes you calculated the average based on the average values in 8:00, 7:30, and 7:00.
library(dplyr)
library(tidyr)
library(lubridate)
library(imputeTS)
dt2 <- dt %>%
mutate(Date.Time = mdy_hm(Date.Time)) %>%
filter(Date.Time %in% seq(min(Date.Time), max(Date.Time), by = "15 min")) %>%
complete(Date.Time = seq(min(Date.Time), max(Date.Time), by = "15 min")) %>%
mutate(Object.Value = na.interpolation(Object.Value)) %>%
fill(Object.Name) %>%
arrange(desc(Date.Time))
dt2
# A tibble: 5 x 3
Date.Time Object.Name Object.Value
<dttm> <chr> <dbl>
1 2017-07-28 08:00:00 a 465.000
2 2017-07-28 07:45:00 a 464.875
3 2017-07-28 07:30:00 a 464.750
4 2017-07-28 07:15:00 a 464.875
5 2017-07-28 07:00:00 a 465.000
Data
dt <- read.table(text = "'Date Time' 'Object Name' 'Object Value'
'7/28/2017 8:00' a 465
'7/28/2017 7:50' a 465
'7/28/2017 7:40' a 464.75
'7/28/2017 7:30' a 464.75
'7/28/2017 7:20' a 464.75
'7/28/2017 7:10' a 465
'7/28/2017 7:00' a 465",
header = TRUE, stringsAsFactors = FALSE)
If the values measured on the 10-minute intervals are time-integrated averages over that period, it's reasonable to average them to a different period. If these are instantaneous measurements, then it's more reasonable to smooth them as others have suggested.
To take time-integrated averages measured on the 10-minute schedule and average those to the 15-minute schedule, you can use the intervalaverage package:
library(data.table)
library(intervalaverage)
x <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:50", "7/28/2017 7:45",
"7/28/2017 7:40", "7/28/2017 7:30", "7/28/2017 7:20", "7/28/2017 7:15",
"7/28/2017 7:10", "7/28/2017 7:00"), A = c(443.75, 440.75, NA,
447.5, 448.75, 444.5, NA, 440.25, 447.5)), row.names = c(NA,
-9L), class = "data.frame")
y <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:45", "7/28/2017 7:30",
"7/28/2017 7:15", "7/28/2017 7:00")), row.names = c(NA, -5L), class = "data.frame")
setDT(x)
setDT(y)
x
#> time A
#> 1: 7/28/2017 8:00 443.75
#> 2: 7/28/2017 7:50 440.75
#> 3: 7/28/2017 7:45 NA
#> 4: 7/28/2017 7:40 447.50
#> 5: 7/28/2017 7:30 448.75
#> 6: 7/28/2017 7:20 444.50
#> 7: 7/28/2017 7:15 NA
#> 8: 7/28/2017 7:10 440.25
#> 9: 7/28/2017 7:00 447.50
y
#> time
#> 1: 7/28/2017 8:00
#> 2: 7/28/2017 7:45
#> 3: 7/28/2017 7:30
#> 4: 7/28/2017 7:15
#> 5: 7/28/2017 7:00
x[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")]
setnames(x, "time","start_time")
x[, start_time_integer:=as.integer(start_time)]
y[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")]
setnames(y, "time","start_time")
y[, start_time_integer:=as.integer(start_time)]
setkey(y, start_time)
setkey(x, start_time)
##drop time times at 15 and 45
x <- x[!start_time %in% as.POSIXct(c("2017-07-28 07:45:00","2017-07-28 07:15:00"),tz="UTC")]
x[, end_time_integer:=as.integer(start_time)+60L*10L-1L]
x[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")]
y[, end_time_integer:=as.integer(start_time)+60L*15L-1L]
y[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")]
x
#> start_time A start_time_integer end_time_integer
#> 1: 2017-07-28 07:00:00 447.50 1501225200 1501225799
#> 2: 2017-07-28 07:10:00 440.25 1501225800 1501226399
#> 3: 2017-07-28 07:20:00 444.50 1501226400 1501226999
#> 4: 2017-07-28 07:30:00 448.75 1501227000 1501227599
#> 5: 2017-07-28 07:40:00 447.50 1501227600 1501228199
#> 6: 2017-07-28 07:50:00 440.75 1501228200 1501228799
#> 7: 2017-07-28 08:00:00 443.75 1501228800 1501229399
#> end_time
#> 1: 2017-07-28 07:09:59
#> 2: 2017-07-28 07:19:59
#> 3: 2017-07-28 07:29:59
#> 4: 2017-07-28 07:39:59
#> 5: 2017-07-28 07:49:59
#> 6: 2017-07-28 07:59:59
#> 7: 2017-07-28 08:09:59
y
#> start_time start_time_integer end_time_integer end_time
#> 1: 2017-07-28 07:00:00 1501225200 1501226099 2017-07-28 07:14:59
#> 2: 2017-07-28 07:15:00 1501226100 1501226999 2017-07-28 07:29:59
#> 3: 2017-07-28 07:30:00 1501227000 1501227899 2017-07-28 07:44:59
#> 4: 2017-07-28 07:45:00 1501227900 1501228799 2017-07-28 07:59:59
#> 5: 2017-07-28 08:00:00 1501228800 1501229699 2017-07-28 08:14:59
out <- intervalaverage(x,y,interval_vars=c("start_time_integer","end_time_integer"),value_vars="A")
out[, start_time:=as.POSIXct(start_time_integer,origin="1969-12-31 24:00:00",tz="UTC")]
out[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz="UTC")]
out[, list(start_time,end_time, A)]
#> start_time end_time A
#> 1: 2017-07-28 07:00:00 2017-07-28 07:14:59 445.0833
#> 2: 2017-07-28 07:15:00 2017-07-28 07:29:59 443.0833
#> 3: 2017-07-28 07:30:00 2017-07-28 07:44:59 448.3333
#> 4: 2017-07-28 07:45:00 2017-07-28 07:59:59 443.0000
#> 5: 2017-07-28 08:00:00 2017-07-28 08:14:59 NA
#Note that this just equivalent to taking weighted.mean:
weighted.mean(c(447.5,440.25),w=c(10,5))
#> [1] 445.0833
weighted.mean(c(440.25,444.5),w=c(5,10))
#> [1] 443.0833
#etc
Note that the intervalaverage package requires integer columns defining closed intervals, hence the conversion to integer. integers are converted back to datetime (POSIXct) for readability.

Resources