How to calculate daily fluid infusion volume with variable infusion rates - r
Working in R, I need to calculate daily infusion volume (mL) given a variable infusion rate (mL/hour).
My dataframe has two columns: date (year, month, day, hours, mins, sec) when the infusion rate was changed, and the new infusion rate (ml/hr). From these data I have calculated cumulative infusion volume for the entire study (~ 3 weeks duration). I now need to calculate infusion volume for every 24 hours, midnight to midnight. The first and last study days are less than 24 hours duration and are excluded.
I don't know how to approach my problem with infusion rates spanning across 24 hour time periods at midnight.
One thought was to generate a new data frame consisting of time in secs (from zero to end of study) and volume infused per second, then sum infusion volume every day. This of course will generate a large (unnecessary) dataframe (>1 million rows).
I am looking for direction on how to approach in R.
No code to share at this time. My dataframe is shared:https://drive.google.com/file/d/1YfZkuOStOxWIXrxklWEo1r46hjFQPIXM/view
DF <- structure(list(`date&time` = structure(c(1519043251, 1519047111,
1519049877, 1519050201, 1519053454, 1519054180, 1519060742, 1519062334,
1519083584, 1519108892, 1519114732, 1519118888, 1519127198, 1519140960,
1519142031, 1519150508, 1519161027, 1519167167, 1519206508, 1519206877,
1519222879, 1519278875, 1519290863, 1519293411, 1519314864, 1519317665,
1519334695, 1519364934, 1519364996, 1519378625, 1519384577, 1519428049,
1519495090, 1519541667, 1519544091, 1519551993, 1519594678, 1519626216,
1519650059, 1519658045, 1519712871, 1519722853, 1519726863, 1519744270,
1519786071, 1519787755, 1519788820, 1519789685, 1519791798, 1519801303,
1519801380, 1519809813, 1519815924, 1519826260, 1519830433, 1519833629,
1519841284, 1519857415, 1519885051, 1519885120, 1519885141, 1519887091,
1519939049, 1519939482, 1519945740, 1519971397, 1519975527, 1519987363,
1519988481, 1520004464, 1520033974, 1520093329, 1520179994, 1520204550,
1520233073, 1520237983, 1520238103, 1520241519, 1520241904, 1520263216,
1520290670, 1520349278, 1520370509, 1520406514, 1520436434, 1520447318,
1520456518, 1520461383, 1520501027, 1520522600, 1520542062, 1520590191,
1520618693, 1520621059, 1520626341, 1520627226, 1520630596, 1520637370,
1520664044, 1520676143, 1520689466, 1520717079, 1520724147, 1520754787,
1520788241, 1520806426, 1520818840, 1520829807, 1520839843, 1520839936,
1520891100, 1520897458, 1520921676, 1520933752), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), `infusion rate` = c(25.75, 30.75,
25.75, 25.81, 25.81, 25.75, 25.65, 25.65, 27.55, 18.47, 18.25,
16.25, 15.25, 13.25, 13.25, 15.25, 16.25, 15.25, 15.45, 12.45,
12.25, 12.45, 11.45, 11.5, 11.57, 13.57, 11.57, 10.57, 10.55,
11.55, 13.55, 13.52, 13.56, 13.64, 13.7, 13.67, 13.67, 13.65,
14.65, 14.61, 14.67, 14.69, 13.69, 13.67, 16.67, 21.67, 24.67,
29.67, 34.67, 29.67, 29.65, 24.65, 22.65, 19.65, 19.65, 17.65,
14.65, 14.63, 14.65, 15.65, 14.65, 15.65, 16.65, 15.65, 15.68,
15.71, 15.74, 15.81, 15.92, 15.89, 15.9, 15.94, 15.93, 14.94,
15.92, 16.03, 15.03, 15, 15.02, 14.96, 14.91, 14.93, 14.94, 14.94,
14.91, 14.92, 14.92, 14.92, 14.94, 14.95, 15.95, 14.95, 16.95,
19.95, 22.95, 25.95, 26.95, 26.93, 26.89, 23.89, 20.89, 18.89,
18.87, 16.87, 15.87, 15.87, 14.87, 17.87, 16.87, 16.98, 17.98,
16.98, 15.98, 0)), row.names = 2:115, class = "data.frame")
I need the output to be two columns of data; time in days and daily infusion volume.
One possible solution is to use the foverlaps() function from the data.table package. foverlaps() finds all overlapping intervals (ranges, periods) by an overlap join:
library(data.table)
# coerce to data.table
setDT(DF)
# rename column names (syntactically correct)
setnames(DF, names(DF) %>% make.names())
DF
# create intervals (ranges) of infusion periods
DF_ranges <- DF[, .(start = head(date.time, -1L),
end = tail(date.time, -1L),
inf.rate = head(infusion.rate, -1L))]
setkey(DF_ranges, start, end)
# create sequence of calendar days (starting at midnight)
day_seq <- DF[, seq(lubridate::floor_date(min(date.time), "day"),
max(date.time), "1 day")]
# create intervals of days (from midnight to midnight)
day_ranges <- data.table(start = day_seq, end = day_seq + as.difftime(1, units = "days"))
# find all overlapping intervals (overlap join )
ovl <- foverlaps(day_ranges, DF_ranges)
# compute duration of infusion periods within each day
ovl[, inf.hours := difftime(pmin(end, i.end), pmax(start, i.start), units = "hours")]
# compute infusion volume for each period
ovl[, inf.vol := inf.rate * as.double(inf.hours)]
# aggregate by day
ovl[, .(inf.vol.per.day = sum(inf.vol)), by = .(day = as.Date(i.start))][
# drop first and last day
-c(1L, .N)]
day inf.vol.per.day
1: 2018-02-20 455.7107
2: 2018-02-21 324.6403
3: 2018-02-22 293.5880
4: 2018-02-23 298.9512
5: 2018-02-24 324.7212
6: 2018-02-25 327.3658
7: 2018-02-26 338.3609
8: 2018-02-27 338.1620
9: 2018-02-28 507.9508
10: 2018-03-01 368.7672
11: 2018-03-02 379.4539
12: 2018-03-03 381.9141
13: 2018-03-04 381.5335
14: 2018-03-05 360.6198
15: 2018-03-06 358.0437
16: 2018-03-07 358.3588
17: 2018-03-08 361.6632
18: 2018-03-09 421.2107
19: 2018-03-10 567.7771
20: 2018-03-11 413.8286
21: 2018-03-12 403.4742
day inf.vol.per.day
The intermediate results are
DF_ranges
start end inf.rate
1: 2018-02-19 12:27:31 2018-02-19 13:31:51 25.75
2: 2018-02-19 13:31:51 2018-02-19 14:17:57 30.75
3: 2018-02-19 14:17:57 2018-02-19 14:23:21 25.75
4: 2018-02-19 14:23:21 2018-02-19 15:17:34 25.81
5: 2018-02-19 15:17:34 2018-02-19 15:29:40 25.81
---
109: 2018-03-12 07:30:43 2018-03-12 07:32:16 16.87
110: 2018-03-12 07:32:16 2018-03-12 21:45:00 16.98
111: 2018-03-12 21:45:00 2018-03-12 23:30:58 17.98
112: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98
113: 2018-03-13 06:14:36 2018-03-13 09:35:52 15.98
day_ranges
start end
1: 2018-02-19 2018-02-20
2: 2018-02-20 2018-02-21
3: 2018-02-21 2018-02-22
4: 2018-02-22 2018-02-23
5: 2018-02-23 2018-02-24
6: 2018-02-24 2018-02-25
7: 2018-02-25 2018-02-26
8: 2018-02-26 2018-02-27
9: 2018-02-27 2018-02-28
10: 2018-02-28 2018-03-01
11: 2018-03-01 2018-03-02
12: 2018-03-02 2018-03-03
13: 2018-03-03 2018-03-04
14: 2018-03-04 2018-03-05
15: 2018-03-05 2018-03-06
16: 2018-03-06 2018-03-07
17: 2018-03-07 2018-03-08
18: 2018-03-08 2018-03-09
19: 2018-03-09 2018-03-10
20: 2018-03-10 2018-03-11
21: 2018-03-11 2018-03-12
22: 2018-03-12 2018-03-13
23: 2018-03-13 2018-03-14
start end
foverlaps(day_ranges, DF_ranges)
start end inf.rate i.start i.end
1: 2018-02-19 12:27:31 2018-02-19 13:31:51 25.75 2018-02-19 2018-02-20
2: 2018-02-19 13:31:51 2018-02-19 14:17:57 30.75 2018-02-19 2018-02-20
3: 2018-02-19 14:17:57 2018-02-19 14:23:21 25.75 2018-02-19 2018-02-20
4: 2018-02-19 14:23:21 2018-02-19 15:17:34 25.81 2018-02-19 2018-02-20
5: 2018-02-19 15:17:34 2018-02-19 15:29:40 25.81 2018-02-19 2018-02-20
---
131: 2018-03-12 07:32:16 2018-03-12 21:45:00 16.98 2018-03-12 2018-03-13
132: 2018-03-12 21:45:00 2018-03-12 23:30:58 17.98 2018-03-12 2018-03-13
133: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98 2018-03-12 2018-03-13
134: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98 2018-03-13 2018-03-14
135: 2018-03-13 06:14:36 2018-03-13 09:35:52 15.98 2018-03-13 2018-03-14
ovl
start end inf.rate i.start i.end inf.hours inf.vol
1: 2018-02-19 12:27:31 2018-02-19 13:31:51 25.75 2018-02-19 2018-02-20 1.0722222 hours 27.609722
2: 2018-02-19 13:31:51 2018-02-19 14:17:57 30.75 2018-02-19 2018-02-20 0.7683333 hours 23.626250
3: 2018-02-19 14:17:57 2018-02-19 14:23:21 25.75 2018-02-19 2018-02-20 0.0900000 hours 2.317500
4: 2018-02-19 14:23:21 2018-02-19 15:17:34 25.81 2018-02-19 2018-02-20 0.9036111 hours 23.322203
5: 2018-02-19 15:17:34 2018-02-19 15:29:40 25.81 2018-02-19 2018-02-20 0.2016667 hours 5.205017
---
131: 2018-03-12 07:32:16 2018-03-12 21:45:00 16.98 2018-03-12 2018-03-13 14.2122222 hours 241.323533
132: 2018-03-12 21:45:00 2018-03-12 23:30:58 17.98 2018-03-12 2018-03-13 1.7661111 hours 31.754678
133: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98 2018-03-12 2018-03-13 0.4838889 hours 8.216433
134: 2018-03-12 23:30:58 2018-03-13 06:14:36 16.98 2018-03-13 2018-03-14 6.2433333 hours 106.011800
135: 2018-03-13 06:14:36 2018-03-13 09:35:52 15.98 2018-03-13 2018-03-14 3.3544444 hours 53.604022
Related
Subsetting a gps-track dataset based on time intervals gathered from a second dataset
I have a large gps-track dataset and I want to extract only the positions taken while an observer was on duty. In other terms, I need to cut the gps-tracks in several transects in which an observer was watching. The watching periods are in a second DB in which the observer registered the start and end of (roughly hourly) watching periods, so that the start time and end time registered for each day marks the start and end of the watch period for that day in most cases. However, it can happen that the watching was paused for some reason and then restarted some time later on the same day, so that two consecutive annotations can have a time gap between them. I was trying with match() and dplyr:filter() functions but can't came out with a solution. Any idea would be greatly appreciated. Below it is a simplified example DB1 (very large gps track to subset) date time lat lon 1 18/04/2017 6:10 34.01 -53.07 2 18/04/2017 6:20 34.02 -53.09 3 18/04/2017 6:30 34.04 -53.10 4 18/04/2017 6:40 34.05 -53.11 5 18/04/2017 6:50 34.07 -53.13 6 18/04/2017 7:00 34.08 -53.14 7 18/04/2017 7:10 34.01 -53.07 8 18/04/2017 7:20 34.02 -53.09 9 18/04/2017 7:30 34.04 -53.10 . . . . . . . . . . . . . . . n 19/04/2017 6:10 34.05 -53.11 n+1 19/04/2017 6:20 34.07 -53.13 n+2 19/04/2017 6:30 34.08 -53.14 DB2 (watching periods) date start.watch end.watch 1 2017-04-18 05:00 06:10 2 2017-04-18 06:10 06:30 3 2017-04-18 06:30 06:45 4 2017-04-18 07:20 08:20 . . . . . . . . . . . . n 2017-04-19 06:20 07:20 n+1 2017-04-19 07:20 08:40 Resulting DB should be:` 1 18/04/2017 6:10 34.01 -53.07 2 18/04/2017 6:20 34.02 -53.09 3 18/04/2017 6:30 34.04 -53.10 4 18/04/2017 6:40 34.05 -53.11 8 18/04/2017 7:20 34.02 -53.09 9 18/04/2017 7:30 34.04 -53.10 n 19/04/2017 6:10 34.05 -53.11 n+1 19/04/2017 6:20 34.07 -53.13 n+2 19/04/2017 6:30 34.08 -53.14
Here's an alternative that does a range-based (fuzzy) join based on time overlaps. It uses data.table::foverlaps, which does require (at least for this join) that the two frames be proper data.table objects, because it needs the keys to be clearly set. This method has a few requirements: All timestamps are easily comparable numerically, I'll convert them to POSIXt objects; Keys are set for at least the second table (and might help in the first). The last two keys for each must be the beginning and end of each time interval; and Yes, you read that right, even the "single time observations" need two timestamp fields. NB: I use magrittr solely to break out the process into a pipeline of sorts; it is not at all required, just makes it easier to read. Also, I use copy() and setDT and then assign to a new variable primarily because (1) I iterated a few times but wanted to start with fresh data each time; and more importantly (2) because data.table operates in side-effect, I want to encourage you to try this but not kill your local data until you are comfortable working with it in side-effect. You can easily un-data.table-ize it after the fact. First, I'll set up the needed conditions. library(data.table) library(magrittr) DB1dt <- copy(DB1) %>% setDT() %>% .[, dt := as.POSIXct(paste(date, time), format = "%d/%m/%Y %H:%M") ] %>% # remove unneeded columns .[, c("date", "time") := NULL ] %>% .[, dt2 := dt ] %>% setkey(dt, dt2) DB2dt <- copy(DB2) %>% setDT() %>% .[, startdt := as.POSIXct(paste(date, start.watch), format = "%Y-%m-%d %H:%M") ] %>% .[, enddt := as.POSIXct(paste(date, end.watch), format = "%Y-%m-%d %H:%M") - 1e-5 ] %>% # remove unneeded columns .[, c("date", "start.watch", "end.watch") := NULL ] %>% setkey(startdt, enddt) DB1dt[1:2,] # lat lon dt dt2 # 1: 34.01 -53.07 2017-04-18 06:10:00 2017-04-18 06:10:00 # 2: 34.02 -53.09 2017-04-18 06:20:00 2017-04-18 06:20:00 DB2dt[1:2,] # startdt enddt # 1: 2017-04-18 05:00:00 2017-04-18 06:09:59 # 2: 2017-04-18 06:10:00 2017-04-18 06:29:59 FYI: the use of -1e-5 is because the "within"-join is closed on both ends ([a,b], in constrast to open-right [a,b)), so equality on enddt would match. Over to you if you want to keep this. From here, the overlapping join is simply: foverlaps(DB1dt, DB2dt, type = "within", nomatch = NULL) # startdt enddt lat lon dt dt2 # 1: 2017-04-18 06:10:00 2017-04-18 06:29:59 34.01 -53.07 2017-04-18 06:10:00 2017-04-18 06:10:00 # 2: 2017-04-18 06:10:00 2017-04-18 06:29:59 34.02 -53.09 2017-04-18 06:20:00 2017-04-18 06:20:00 # 3: 2017-04-18 06:30:00 2017-04-18 06:44:59 34.04 -53.10 2017-04-18 06:30:00 2017-04-18 06:30:00 # 4: 2017-04-18 06:30:00 2017-04-18 06:44:59 34.05 -53.11 2017-04-18 06:40:00 2017-04-18 06:40:00 # 5: 2017-04-18 07:20:00 2017-04-18 08:19:59 34.02 -53.09 2017-04-18 07:20:00 2017-04-18 07:20:00 # 6: 2017-04-18 07:20:00 2017-04-18 08:19:59 34.04 -53.10 2017-04-18 07:30:00 2017-04-18 07:30:00 # 7: 2017-04-19 06:20:00 2017-04-19 07:19:59 34.07 -53.13 2017-04-19 06:20:00 2017-04-19 06:20:00 # 8: 2017-04-19 06:20:00 2017-04-19 07:19:59 34.08 -53.14 2017-04-19 06:30:00 2017-04-19 06:30:00 Sample data: DB1 <- read.table(stringsAsFactors = FALSE, header = TRUE, text = " date time lat lon 18/04/2017 6:10 34.01 -53.07 18/04/2017 6:20 34.02 -53.09 18/04/2017 6:30 34.04 -53.10 18/04/2017 6:40 34.05 -53.11 18/04/2017 6:50 34.07 -53.13 18/04/2017 7:00 34.08 -53.14 18/04/2017 7:10 34.01 -53.07 18/04/2017 7:20 34.02 -53.09 18/04/2017 7:30 34.04 -53.10 19/04/2017 6:10 34.05 -53.11 19/04/2017 6:20 34.07 -53.13 19/04/2017 6:30 34.08 -53.14") DB2 <- read.table(stringsAsFactors = FALSE, header = TRUE, text = " date start.watch end.watch 2017-04-18 05:00 06:10 2017-04-18 06:10 06:30 2017-04-18 06:30 06:45 2017-04-18 07:20 08:20 2017-04-19 06:20 07:20 2017-04-19 07:20 08:40") Related reading: https://codereview.stackexchange.com/q/224705 https://github.com/Rdatatable/data.table/issues/3721
Here is, I think, the solution to your question. The code should be clear, but in brief, the key part is to create datetime columns and intervals with the lubridate package, and then use lubridate's %within% function to check if a given time is inside the given intervals. Hope this helps. library(tidyverse) library(lubridate) #> #> Attaching package: 'lubridate' #> The following object is masked from 'package:base': #> #> date db1 <- tribble(~date, ~time, ~lat, ~lon, "18/04/2017", "6:10", 34.01, -53.07, "18/04/2017", "6:20", 34.02, -53.09, "18/04/2017", "6:30", 34.04, -53.10, "18/04/2017", "6:40", 34.05, -53.11, "18/04/2017", "6:50", 34.07, -53.13, "18/04/2017", "7:00", 34.08, -53.14, "18/04/2017", "7:10", 34.01, -53.07, "18/04/2017", "7:20", 34.02, -53.09, "18/04/2017", "7:30", 34.04, -53.10 ) db2 <- tribble(~date, ~start.watch, ~end.watch, "2017-04-18", "05:00", "06:10", "2017-04-18", "06:10", "06:30", "2017-04-18", "06:30", "06:45", "2017-04-18", "07:20", "08:20") db2_intervals <- db2 %>% mutate(end_date = date) %>% unite("start_datetime", date, start.watch) %>% unite("end_datetime", end_date, end.watch) %>% transmute(interval = interval(start = ymd_hm(start_datetime), end = ymd_hm(end_datetime))) %>% pull(interval) db1 %>% unite("datetime", date, time) %>% mutate(datetime = lubridate::dmy_hm(datetime)) %>% filter(datetime %within% as.list(db2_intervals)) #> # A tibble: 6 x 3 #> datetime lat lon #> <dttm> <dbl> <dbl> #> 1 2017-04-18 06:10:00 34.0 -53.1 #> 2 2017-04-18 06:20:00 34.0 -53.1 #> 3 2017-04-18 06:30:00 34.0 -53.1 #> 4 2017-04-18 06:40:00 34.0 -53.1 #> 5 2017-04-18 07:20:00 34.0 -53.1 #> 6 2017-04-18 07:30:00 34.0 -53.1 Created on 2019-10-09 by the reprex package (v0.3.0)
How do I extract data from a data frame based on the months?
I have a data frame, df, that has date and two variables in it. I would like to either extract all of Oct-Dec data or delete the other months data from the data frame. I have put the data into a data frame but at the moment have the whole year, I just want to extract the wanted data. In future I will also be extracting just winter data. I have attached my chunk of my data frame, I tried using format() with just %m but couldn't get it to work. 14138 2017-09-15 4.655946e-01 0.0603515884 14139 2017-09-16 7.881137e-01 0.0479933304 14140 2017-09-17 5.018990e-01 0.0256871025 14141 2017-09-18 -1.583625e-01 -0.0040893990 14142 2017-09-19 -6.733220e-01 -0.0313100989 14143 2017-09-20 -1.225730e+00 -0.0587706331 14144 2017-09-21 -1.419133e+00 -0.0958125544 14145 2017-09-22 -1.338630e+00 -0.0902803173 14146 2017-09-23 -1.272554e+00 -0.0659170673 14147 2017-09-24 -1.132318e+00 -0.0387240370 14148 2017-09-25 -1.255414e+00 -0.0392615823 14149 2017-09-26 -1.497188e+00 -0.0438491356 14150 2017-09-27 -1.427622e+00 -0.0633879185 14151 2017-09-28 -1.051756e+00 -0.0992427127 14152 2017-09-29 -4.876309e-01 -0.1448044528 14153 2017-09-30 -6.829681e-02 -0.1749463647 14154 2017-10-01 -1.413768e-01 -0.2009916094 14155 2017-10-02 6.359742e-02 -0.1975848313 14156 2017-10-03 9.103277e-01 -0.1828581805 14157 2017-10-04 1.695776e+00 -0.1589352546 14158 2017-10-05 1.913918e+00 -0.1538234614 14159 2017-10-06 1.479714e+00 -0.1937094170 14160 2017-10-07 8.783669e-01 -0.1703790211 14161 2017-10-08 5.706581e-01 -0.1294144428 14162 2017-10-09 4.979405e-01 -0.0666569815 14163 2017-10-10 3.233477e-01 0.0072006102 14164 2017-10-11 3.057630e-01 0.0863445067 14165 2017-10-12 5.877673e-01 0.1097707831 14166 2017-10-13 1.208526e+00 0.1301967193 14167 2017-10-14 1.671705e+00 0.1728109268 14168 2017-10-15 1.810979e+00 0.2264911145 14169 2017-10-16 1.426651e+00 0.2702958315 14170 2017-10-17 1.241140e+00 0.3242637704 14171 2017-10-18 8.997498e-01 0.3879727861 14172 2017-10-19 5.594161e-01 0.4172990825 14173 2017-10-20 3.980254e-01 0.3915170864 14174 2017-10-21 2.138538e-01 0.3249736995 14175 2017-10-22 3.926440e-01 0.2224834840 14176 2017-10-23 2.268644e-01 0.0529143372 14177 2017-10-24 5.664923e-01 -0.0081443464 14178 2017-10-25 6.167520e-01 0.0312073984 14179 2017-10-26 7.751882e-02 0.0043897693 14180 2017-10-27 -5.634851e-02 -0.0726825266 14181 2017-10-28 -2.122061e-01 -0.1711305549 14182 2017-10-29 -8.500991e-01 -0.2068581639 14183 2017-10-30 -1.039685e+00 -0.2909120824 14184 2017-10-31 -3.057745e-01 -0.3933633317 14185 2017-11-01 -1.288774e-01 -0.3726346136 14186 2017-11-02 -5.608007e-03 -0.2425754386 14187 2017-11-03 4.853990e-01 -0.0503543980 14188 2017-11-04 5.822672e-01 0.0896130098 14189 2017-11-05 8.491505e-01 0.1299151006 14190 2017-11-06 1.052999e+00 0.0749888307 14191 2017-11-07 1.170470e+00 0.0287317882 14192 2017-11-08 7.919862e-01 0.0788187381 14193 2017-11-09 4.574565e-01 0.1539981316 14194 2017-11-10 4.552032e-01 0.2034393145 14195 2017-11-11 -3.621350e-01 0.2077476707 14196 2017-11-12 -8.053965e-01 0.1759558604 14197 2017-11-13 -8.307459e-01 0.1802858410 14198 2017-11-14 -9.421325e-01 0.2175529008 14199 2017-11-15 -9.880204e-01 0.2392924580 14200 2017-11-16 -7.448127e-01 0.2519253751 14201 2017-11-17 -8.081435e-01 0.2614254732 14202 2017-11-18 -1.216806e+00 0.2629971336 14203 2017-11-19 -1.122674e+00 0.3469995055 14204 2017-11-20 -1.242597e+00 0.4553094014 14205 2017-11-21 -1.294885e+00 0.5049438231 14206 2017-11-22 -9.325514e-01 0.4684133163 14207 2017-11-23 -4.632281e-01 0.4071673624 14208 2017-11-24 -9.689322e-02 0.3710270269 14209 2017-11-25 4.704467e-01 0.4126721465 14210 2017-11-26 8.682453e-01 0.3745057653 14211 2017-11-27 5.105564e-01 0.2373454931 14212 2017-11-28 4.747265e-01 0.1650783370 14213 2017-11-29 5.905379e-01 0.2632154120 14214 2017-11-30 4.083787e-01 0.3888834762 14215 2017-12-01 3.451736e-01 0.5008047592 14216 2017-12-02 5.161312e-01 0.5388177242 14217 2017-12-03 7.109279e-01 0.5515360710 14218 2017-12-04 4.458635e-01 0.5127537202 14219 2017-12-05 -3.986610e-01 0.3896493238 14220 2017-12-06 -5.968253e-01 0.1095843268 14221 2017-12-07 -1.604398e-01 -0.2455506506 14222 2017-12-08 -4.384744e-01 -0.5801038215 14223 2017-12-09 -7.255016e-01 -0.8384627087 14224 2017-12-10 -9.691828e-01 -0.9223171538 14225 2017-12-11 -1.140588e+00 -0.8177806761 14226 2017-12-12 -1.956622e-01 -0.5250998474 14227 2017-12-13 -1.083792e-01 -0.3430768534 14228 2017-12-14 -8.016345e-02 -0.3163476104 14229 2017-12-15 8.899266e-01 -0.2813253830 14230 2017-12-16 1.322833e+00 -0.2545953062 14231 2017-12-17 1.547972e+00 -0.2275373110 14232 2017-12-18 2.164907e+00 -0.3217205817 14233 2017-12-19 2.276258e+00 -0.5773412429 14234 2017-12-20 1.862291e+00 -0.7728091393 14235 2017-12-21 1.125083e+00 -0.9099696881 14236 2017-12-22 7.737118e-01 -1.2441963604 14237 2017-12-23 7.863508e-01 -1.4802661587 14238 2017-12-24 4.313111e-01 -1.4111320559 14239 2017-12-25 -8.814799e-02 -1.0024805520 14240 2017-12-26 -3.615127e-01 -0.4943077147 14241 2017-12-27 -5.011363e-01 -0.0308588186 14242 2017-12-28 -8.474088e-01 0.3717555895 14243 2017-12-29 -7.283247e-01 0.8230450219 14244 2017-12-30 -4.566981e-01 1.2495961116 14245 2017-12-31 -4.577034e-01 1.4805369230 14246 2018-01-01 1.946166e-01 1.5310004017 14247 2018-01-02 5.203149e-01 1.5384595802 14248 2018-01-03 5.024570e-02 1.4036679018 14249 2018-01-04 -7.065297e-01 1.0749574137 14250 2018-01-05 -8.741815e-01 0.7608524752 14251 2018-01-06 1.589530e-01 0.7891084646 14252 2018-01-07 8.632378e-01 1.1230358751 As requested, the class is "Date".
You can use lubridate and base R: library(lubridate) dats[month(ymd(dats$V2)) >= 10,] # EDIT if the class of the date variable is date, it should be only dats[month(dats$V2) >= 10,] Or fully base without any date work: dats[substr(dats$V2,6,7) %in% c("10","11","12"),] With data: V1 V2 V3 V4 1 14138 2017-09-15 0.4655946 0.06035159 2 14139 2017-09-16 0.7881137 0.04799333 ...
From your question, it is unclear what format the date variable is in. Maybe add the output of class(your_date_variable) to the question. As a general rule, though, you'll want to use filter from the dplyr package. Something like this: new_data <- data %>% filter(format(date_variable, "%m") >= 10) This might change slightly depending on the class of your date variable.
Assuming the 'date_variable' is Date class, extract the month and do a comparison in filter (action verb from dplyr) library(dplyr) library(lubridate) data %>% filter(month(date_variable) >= 10)
How to plot lagged data against other data in R
I would like to lag one variable by, say, 10 time steps and plot it against the other variable which remains the same. I would like to do this for various lags to see if there is a time period that the first variable influences the other. The data I have is daily and after lagging I am separating into Dec-Feb data only. The problem I am having is the plot and correlation between the lagged variable and the other data is coming out the same as the non-lagged plot and correlation every time. I am not sure how to achieve this. A sample of my data frame "data" can be seen below. Date x y 14158 2017-10-05 1.913918e+00 -0.1538234614 14159 2017-10-06 1.479714e+00 -0.1937094170 14160 2017-10-07 8.783669e-01 -0.1703790211 14161 2017-10-08 5.706581e-01 -0.1294144428 14162 2017-10-09 4.979405e-01 -0.0666569815 14163 2017-10-10 3.233477e-01 0.0072006102 14164 2017-10-11 3.057630e-01 0.0863445067 14165 2017-10-12 5.877673e-01 0.1097707831 14166 2017-10-13 1.208526e+00 0.1301967193 14167 2017-10-14 1.671705e+00 0.1728109268 14168 2017-10-15 1.810979e+00 0.2264911145 14169 2017-10-16 1.426651e+00 0.2702958315 14170 2017-10-17 1.241140e+00 0.3242637704 14171 2017-10-18 8.997498e-01 0.3879727861 14172 2017-10-19 5.594161e-01 0.4172990825 14173 2017-10-20 3.980254e-01 0.3915170864 14174 2017-10-21 2.138538e-01 0.3249736995 14175 2017-10-22 3.926440e-01 0.2224834840 14176 2017-10-23 2.268644e-01 0.0529143372 14177 2017-10-24 5.664923e-01 -0.0081443464 14178 2017-10-25 6.167520e-01 0.0312073984 14179 2017-10-26 7.751882e-02 0.0043897693 14180 2017-10-27 -5.634851e-02 -0.0726825266 14181 2017-10-28 -2.122061e-01 -0.1711305549 14182 2017-10-29 -8.500991e-01 -0.2068581639 14183 2017-10-30 -1.039685e+00 -0.2909120824 14184 2017-10-31 -3.057745e-01 -0.3933633317 14185 2017-11-01 -1.288774e-01 -0.3726346136 14186 2017-11-02 -5.608007e-03 -0.2425754386 14187 2017-11-03 4.853990e-01 -0.0503543980 14188 2017-11-04 5.822672e-01 0.0896130098 14189 2017-11-05 8.491505e-01 0.1299151006 14190 2017-11-06 1.052999e+00 0.0749888307 14191 2017-11-07 1.170470e+00 0.0287317882 14192 2017-11-08 7.919862e-01 0.0788187381 14193 2017-11-09 4.574565e-01 0.1539981316 14194 2017-11-10 4.552032e-01 0.2034393145 14195 2017-11-11 -3.621350e-01 0.2077476707 14196 2017-11-12 -8.053965e-01 0.1759558604 14197 2017-11-13 -8.307459e-01 0.1802858410 14198 2017-11-14 -9.421325e-01 0.2175529008 14199 2017-11-15 -9.880204e-01 0.2392924580 14200 2017-11-16 -7.448127e-01 0.2519253751 14201 2017-11-17 -8.081435e-01 0.2614254732 14202 2017-11-18 -1.216806e+00 0.2629971336 14203 2017-11-19 -1.122674e+00 0.3469995055 14204 2017-11-20 -1.242597e+00 0.4553094014 14205 2017-11-21 -1.294885e+00 0.5049438231 14206 2017-11-22 -9.325514e-01 0.4684133163 14207 2017-11-23 -4.632281e-01 0.4071673624 14208 2017-11-24 -9.689322e-02 0.3710270269 14209 2017-11-25 4.704467e-01 0.4126721465 14210 2017-11-26 8.682453e-01 0.3745057653 14211 2017-11-27 5.105564e-01 0.2373454931 14212 2017-11-28 4.747265e-01 0.1650783370 14213 2017-11-29 5.905379e-01 0.2632154120 14214 2017-11-30 4.083787e-01 0.3888834762 14215 2017-12-01 3.451736e-01 0.5008047592 14216 2017-12-02 5.161312e-01 0.5388177242 14217 2017-12-03 7.109279e-01 0.5515360710 14218 2017-12-04 4.458635e-01 0.5127537202 14219 2017-12-05 -3.986610e-01 0.3896493238 14220 2017-12-06 -5.968253e-01 0.1095843268 14221 2017-12-07 -1.604398e-01 -0.2455506506 14222 2017-12-08 -4.384744e-01 -0.5801038215 14223 2017-12-09 -7.255016e-01 -0.8384627087 14224 2017-12-10 -9.691828e-01 -0.9223171538 14225 2017-12-11 -1.140588e+00 -0.8177806761 14226 2017-12-12 -1.956622e-01 -0.5250998474 14227 2017-12-13 -1.083792e-01 -0.3430768534 14228 2017-12-14 -8.016345e-02 -0.3163476104 14229 2017-12-15 8.899266e-01 -0.2813253830 14230 2017-12-16 1.322833e+00 -0.2545953062 14231 2017-12-17 1.547972e+00 -0.2275373110 14232 2017-12-18 2.164907e+00 -0.3217205817 14233 2017-12-19 2.276258e+00 -0.5773412429 14234 2017-12-20 1.862291e+00 -0.7728091393 14235 2017-12-21 1.125083e+00 -0.9099696881 14236 2017-12-22 7.737118e-01 -1.2441963604 14237 2017-12-23 7.863508e-01 -1.4802661587 14238 2017-12-24 4.313111e-01 -1.4111320559 14239 2017-12-25 -8.814799e-02 -1.0024805520 14240 2017-12-26 -3.615127e-01 -0.4943077147 14241 2017-12-27 -5.011363e-01 -0.0308588186 14242 2017-12-28 -8.474088e-01 0.3717555895 14243 2017-12-29 -7.283247e-01 0.8230450219 14244 2017-12-30 -4.566981e-01 1.2495961116 14245 2017-12-31 -4.577034e-01 1.4805369230 14246 2018-01-01 1.946166e-01 1.5310004017 14247 2018-01-02 5.203149e-01 1.5384595802 14248 2018-01-03 5.024570e-02 1.4036679018 14249 2018-01-04 -7.065297e-01 1.0749574137 14250 2018-01-05 -8.741815e-01 0.7608524752 14251 2018-01-06 1.589530e-01 0.7891084646 14252 2018-01-07 8.632378e-01 1.1230358751 I am using lagged <- lag(ts(x), k=10) This is so the tsp isn't ignored. However, when I do cor(data$x, data$y) and cor(lagged, data$y) the result is the same, where I would have thought it would have been different. How do I get this lag to work before I can go ahead separate via date? Many thanks!
How to confront error "wrong embedding dimension" in cajolst R function?
When I try to use cajolst function from urca package I get a strange error. would you please guide me how can i confront the problem? result<-urca::cajolst(data ,trend = FALSE, K = 2, season = NULL) Error in embed(diff(x), K) : wrong embedding dimension. dates A G 2016-11-30 0 0 2016-12-01 -3.53 3.198 2016-12-02 -2.832 8.703 2016-12-04 -2.666 7.799 2016-12-05 -0.54 7.701 2016-12-06 -1.296 4.685 2016-12-07 -1.785 -4.587 2016-12-08 -6.834 -3.696 2016-12-09 -9.624 -5.461 2016-12-11 -11.374 -0.423 2016-12-12 -6.037 -1.614 2016-12-13 -5.934 -3.231 2016-12-14 -7.279 1.072 2016-12-15 -7.859 -4.823 2016-12-16 -15.132 10.838 2016-12-19 -15.345 11.5 2016-12-20 -15.673 6.639 2016-12-21 -15.391 11.162 2016-12-22 -14.357 7.032 2016-12-23 -14.99 12.355 2016-12-26 -15.626 10.944 2016-12-27 -12.297 10.215 2016-12-28 -13.967 5.957 2016-12-29 -12.946 3.446 2016-12-30 -19.681 10.274 2017-01-02 -18.24 8.781 2017-01-03 -16.83 1.116 2017-01-04 -18.189 -0.036 2017-01-05 -15.897 -1.441 2017-01-06 -20.196 -8.534 2017-01-09 -14.57 -28.768 2017-01-10 -13.27 -29.821 2017-01-11 -8.85 -38.881 2017-01-12 -6.375 -50.885 2017-01-13 -8.056 -51.321 2017-01-16 -5.217 -63.619 2017-01-17 -4.75 -39.163 2017-01-18 3.505 -46.309 2017-01-19 10.939 -45.825 2017-01-20 9.248 -42.973 2017-01-23 9.532 -33.396 2017-01-24 4.235 -31.38 2017-01-25 -1.885 -19.21 2017-01-26 -5.027 -15.74 2017-01-27 0.015 -23.029 2017-01-30 -0.685 -30.773 2017-01-31 -2.692 -25.544 2017-02-01 -2.654 -17.912 2017-02-02 4.002 -43.309 2017-02-03 4.813 -52.627 2017-02-06 7.049 -49.965 2017-02-07 10.003 -40.568 2017-02-08 8.996 -39.828 2017-02-09 7.047 -41.19 2017-02-10 7.656 -50.853 2017-02-13 4.986 -41.318 2017-02-14 8.493 -51.946 2017-02-15 12.547 -59.538 2017-02-16 10.327 -54.496 2017-02-17 7.09 -57.571 2017-02-20 11.633 -54.91 2017-02-21 12.664 -51.597 2017-02-22 16.103 -57.819 2017-02-23 14.25 -51.336 2017-02-24 7.794 -54.898 2017-02-27 15.27 -55.754 2017-02-28 19.984 -58.37 2017-03-01 23.899 -70.73 2017-03-02 16.63 -56.29 2017-03-03 16.443 -55.858 2017-03-06 17.901 -59.377 2017-03-07 19.067 -64.383 2017-03-08 17.219 -57.829 2017-03-09 15.694 -55.022 2017-03-10 17.351 -60.431 2017-03-13 18.945 -59.79 2017-03-14 20.001 -64.848 2017-03-15 23.852 -73.806 2017-03-16 22.697 -64.191 2017-03-17 26.892 -65.328 2017-03-20 29.221 -72.764 2017-03-21 25.165 -53.427 2017-03-22 22.998 -51.676 2017-03-23 20.072 -40.57 2017-03-24 20.758 -43.654 2017-03-27 20.062 -33.672 2017-03-28 22.066 -47.184 2017-03-29 22.363 -54.57 2017-03-30 20.684 -48.199 2017-03-31 17.056 -40.887 2017-04-03 19.12 -39.618 2017-04-04 16.359 -37.1 2017-04-05 18.643 -32.734 2017-04-06 14.708 -30.455 2017-04-07 8.403 -33.553 2017-04-10 6.072 -29.048 2017-04-11 5.186 -20.696 2017-04-12 4.248 -20.924 2017-04-13 12.803 -31.075 2017-04-14 12.566 -29.768 2017-04-17 14.065 -28.906 2017-04-18 14.5 4.121 2017-04-19 13.865 8.835 2017-04-20 16.126 6.191 2017-04-21 17.591 3.77 2017-04-24 22.3 -2.497 2017-04-25 22.731 7.408 2017-04-26 19.146 18.45 2017-04-27 19.052 25.541 2017-04-28 21.889 26.878 2017-05-01 27.323 14.362 2017-05-02 29.93 17.525 2017-05-03 19.835 29.856 2017-05-04 19.683 36.72 2017-05-05 13.545 41.055 2017-05-08 14.165 43.544 2017-05-09 11.325 49.978 2017-05-10 10.143 47.072 2017-05-11 13.718 38.901 2017-05-12 14.216 36.017 2017-05-15 13.701 33.797 2017-05-16 13.505 33.867 2017-05-17 13.456 38.004 2017-05-18 12.613 37.758 2017-05-19 11.166 40.367 2017-05-22 12.221 34.022 2017-05-23 13.682 29.793 2017-05-24 10.05 26.701 2017-05-25 10.122 31.394 2017-05-26 7.592 20.073 2017-05-29 6.796 23.809 2017-05-30 9.638 16.1 2017-05-31 7.983 29.043 2017-06-01 3.594 39.557 2017-06-02 8.763 27.863 2017-06-05 12.157 22.397 2017-06-06 13.383 19.053 2017-06-07 20.52 17.449 2017-06-08 19.534 -1.615 2017-06-09 16.011 -1.989 2017-06-12 9.153 -9.294 2017-06-13 4.295 -0.897 2017-06-14 9.743 -9.818 2017-06-15 10.386 -8.255 2017-06-16 11.983 -12.522 2017-06-19 9.513 -12.931 2017-06-20 10.298 -21.024 2017-06-21 11.087 -11.801 2017-06-22 4.472 -9.048 2017-06-23 9.416 -9.592 2017-06-26 9.686 -12.006 2017-06-27 6.424 -2.632 2017-06-28 3.062 -1.016 2017-06-29 5.593 -0.825 2017-06-30 3.531 0.914 2017-07-03 3.208 -2.596 2017-07-04 -6.373 4.289 2017-07-05 -5.149 5.917 2017-07-06 -6.104 12.75 2017-07-07 -9.565 1.615 2017-07-10 -8.961 -0.053 2017-07-11 -4.065 -8.541 2017-07-12 -10.133 -11.286 2017-07-13 -6.223 -15.181 2017-07-14 -1.524 -14.396 2017-07-17 -1.613 -14.61 2017-07-18 5.781 -35.473 2017-07-19 8.243 -44.186 2017-07-20 7.665 -49.857 2017-07-21 0.485 -41.286 2017-07-24 -0.638 -39.127 2017-07-25 0.767 -40.952 2017-07-26 3.566 -44.388 2017-07-27 6.834 -42.543 2017-07-28 1.306 -37.657 2017-07-31 5.839 -34.048 2017-08-01 5.838 -28.939 2017-08-02 7.298 -26.566 2017-08-03 6.804 -32.876 2017-08-04 8.989 -38.618 2017-08-07 8.862 -36.676 2017-08-08 8.234 -40.893 2017-08-09 7.39 -35.16 2017-08-10 8.593 -35.555 2017-08-11 7.253 -35.175 2017-08-14 5.593 -33.644 2017-08-15 4.528 -37.82 2017-08-16 6.752 -53.217 2017-08-17 6.284 -49.252 2017-08-18 4.765 -55.602 2017-08-21 3.905 -54.32 2017-08-22 1.76 -57.853 2017-08-23 0.406 -58.925 2017-08-24 -2.438 -58.098 2017-08-25 -0.791 -56.682 2017-08-28 2.173 -51.278 2017-08-29 2.523 -54.353 2017-08-30 4.482 -46.325 2017-08-31 0.246 -52.567 2017-09-01 -4.214 -53.636 2017-09-04 -4.548 -52.735 2017-09-05 -1.781 -50.421 2017-09-06 -10.463 -51.122 2017-09-07 -13.119 -52.433 2017-09-08 -11.716 -43.493 2017-09-11 -16.15 -43.142 2017-09-12 -12.478 -29.335 2017-09-13 -16.457 -31.697 2017-09-14 -14.615 -15.13 2017-09-15 -13.911 3.023
One of the issue is that the 'Date' column is also included and secondly, the season is not needed, it can be FALSE or specify an integer value library(urca) out <- cajolst(data[-1] ,trend = FALSE, K = 2, season =FALSE) If there is a season effect and it is `quarterly, the value would be 4 out1 <- cajolst(data[-1] ,trend = FALSE, K = 2, season = 4) out1 ##################################################### # Johansen-Procedure Unit Root / Cointegration Test # ##################################################### #The value of the test statistic is: 3.6212 13.2233 data data <- structure(list(dates = c("2016-11-30", "2016-12-01", "2016-12-02", "2016-12-04", "2016-12-05", "2016-12-06", "2016-12-07", "2016-12-08", "2016-12-09", "2016-12-11", "2016-12-12", "2016-12-13", "2016-12-14", "2016-12-15", "2016-12-16", "2016-12-19", "2016-12-20", "2016-12-21", "2016-12-22", "2016-12-23", "2016-12-26", "2016-12-27", "2016-12-28", "2016-12-29", "2016-12-30", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05", "2017-01-06", "2017-01-09", "2017-01-10", "2017-01-11", "2017-01-12", "2017-01-13", "2017-01-16", "2017-01-17", "2017-01-18", "2017-01-19", "2017-01-20", "2017-01-23", "2017-01-24", "2017-01-25", "2017-01-26", "2017-01-27", "2017-01-30", "2017-01-31", "2017-02-01", "2017-02-02", "2017-02-03", "2017-02-06", "2017-02-07", "2017-02-08", "2017-02-09", "2017-02-10", "2017-02-13", "2017-02-14", "2017-02-15", "2017-02-16", "2017-02-17", "2017-02-20", "2017-02-21", "2017-02-22", "2017-02-23", "2017-02-24", "2017-02-27", "2017-02-28", "2017-03-01", "2017-03-02", "2017-03-03", "2017-03-06", "2017-03-07", "2017-03-08", "2017-03-09", "2017-03-10", "2017-03-13", "2017-03-14", "2017-03-15", "2017-03-16", "2017-03-17", "2017-03-20", "2017-03-21", "2017-03-22", "2017-03-23", "2017-03-24", "2017-03-27", "2017-03-28", "2017-03-29", "2017-03-30", "2017-03-31", "2017-04-03", "2017-04-04", "2017-04-05", "2017-04-06", "2017-04-07", "2017-04-10", "2017-04-11", "2017-04-12", "2017-04-13", "2017-04-14", "2017-04-17", "2017-04-18", "2017-04-19", "2017-04-20", "2017-04-21", "2017-04-24", "2017-04-25", "2017-04-26", "2017-04-27", "2017-04-28", "2017-05-01", "2017-05-02", "2017-05-03", "2017-05-04", "2017-05-05", "2017-05-08", "2017-05-09", "2017-05-10", "2017-05-11", "2017-05-12", "2017-05-15", "2017-05-16", "2017-05-17", "2017-05-18", "2017-05-19", "2017-05-22", "2017-05-23", "2017-05-24", "2017-05-25", "2017-05-26", "2017-05-29", "2017-05-30", "2017-05-31", "2017-06-01", "2017-06-02", "2017-06-05", "2017-06-06", "2017-06-07", "2017-06-08", "2017-06-09", "2017-06-12", "2017-06-13", "2017-06-14", "2017-06-15", "2017-06-16", "2017-06-19", "2017-06-20", "2017-06-21", "2017-06-22", "2017-06-23", "2017-06-26", "2017-06-27", "2017-06-28", "2017-06-29", "2017-06-30", "2017-07-03", "2017-07-04", "2017-07-05", "2017-07-06", "2017-07-07", "2017-07-10", "2017-07-11", "2017-07-12", "2017-07-13", "2017-07-14", "2017-07-17", "2017-07-18", "2017-07-19", "2017-07-20", "2017-07-21", "2017-07-24", "2017-07-25", "2017-07-26", "2017-07-27", "2017-07-28", "2017-07-31", "2017-08-01", "2017-08-02", "2017-08-03", "2017-08-04", "2017-08-07", "2017-08-08", "2017-08-09", "2017-08-10", "2017-08-11", "2017-08-14", "2017-08-15", "2017-08-16", "2017-08-17", "2017-08-18", "2017-08-21", "2017-08-22", "2017-08-23", "2017-08-24", "2017-08-25", "2017-08-28", "2017-08-29", "2017-08-30", "2017-08-31", "2017-09-01", "2017-09-04", "2017-09-05", "2017-09-06", "2017-09-07", "2017-09-08", "2017-09-11", "2017-09-12", "2017-09-13", "2017-09-14", "2017-09-15"), A = c(0, -3.53, -2.832, -2.666, -0.54, -1.296, -1.785, -6.834, -9.624, -11.374, -6.037, -5.934, -7.279, -7.859, -15.132, -15.345, -15.673, -15.391, -14.357, -14.99, -15.626, -12.297, -13.967, -12.946, -19.681, -18.24, -16.83, -18.189, -15.897, -20.196, -14.57, -13.27, -8.85, -6.375, -8.056, -5.217, -4.75, 3.505, 10.939, 9.248, 9.532, 4.235, -1.885, -5.027, 0.015, -0.685, -2.692, -2.654, 4.002, 4.813, 7.049, 10.003, 8.996, 7.047, 7.656, 4.986, 8.493, 12.547, 10.327, 7.09, 11.633, 12.664, 16.103, 14.25, 7.794, 15.27, 19.984, 23.899, 16.63, 16.443, 17.901, 19.067, 17.219, 15.694, 17.351, 18.945, 20.001, 23.852, 22.697, 26.892, 29.221, 25.165, 22.998, 20.072, 20.758, 20.062, 22.066, 22.363, 20.684, 17.056, 19.12, 16.359, 18.643, 14.708, 8.403, 6.072, 5.186, 4.248, 12.803, 12.566, 14.065, 14.5, 13.865, 16.126, 17.591, 22.3, 22.731, 19.146, 19.052, 21.889, 27.323, 29.93, 19.835, 19.683, 13.545, 14.165, 11.325, 10.143, 13.718, 14.216, 13.701, 13.505, 13.456, 12.613, 11.166, 12.221, 13.682, 10.05, 10.122, 7.592, 6.796, 9.638, 7.983, 3.594, 8.763, 12.157, 13.383, 20.52, 19.534, 16.011, 9.153, 4.295, 9.743, 10.386, 11.983, 9.513, 10.298, 11.087, 4.472, 9.416, 9.686, 6.424, 3.062, 5.593, 3.531, 3.208, -6.373, -5.149, -6.104, -9.565, -8.961, -4.065, -10.133, -6.223, -1.524, -1.613, 5.781, 8.243, 7.665, 0.485, -0.638, 0.767, 3.566, 6.834, 1.306, 5.839, 5.838, 7.298, 6.804, 8.989, 8.862, 8.234, 7.39, 8.593, 7.253, 5.593, 4.528, 6.752, 6.284, 4.765, 3.905, 1.76, 0.406, -2.438, -0.791, 2.173, 2.523, 4.482, 0.246, -4.214, -4.548, -1.781, -10.463, -13.119, -11.716, -16.15, -12.478, -16.457, -14.615, -13.911), G = c(0, 3.198, 8.703, 7.799, 7.701, 4.685, -4.587, -3.696, -5.461, -0.423, -1.614, -3.231, 1.072, -4.823, 10.838, 11.5, 6.639, 11.162, 7.032, 12.355, 10.944, 10.215, 5.957, 3.446, 10.274, 8.781, 1.116, -0.036, -1.441, -8.534, -28.768, -29.821, -38.881, -50.885, -51.321, -63.619, -39.163, -46.309, -45.825, -42.973, -33.396, -31.38, -19.21, -15.74, -23.029, -30.773, -25.544, -17.912, -43.309, -52.627, -49.965, -40.568, -39.828, -41.19, -50.853, -41.318, -51.946, -59.538, -54.496, -57.571, -54.91, -51.597, -57.819, -51.336, -54.898, -55.754, -58.37, -70.73, -56.29, -55.858, -59.377, -64.383, -57.829, -55.022, -60.431, -59.79, -64.848, -73.806, -64.191, -65.328, -72.764, -53.427, -51.676, -40.57, -43.654, -33.672, -47.184, -54.57, -48.199, -40.887, -39.618, -37.1, -32.734, -30.455, -33.553, -29.048, -20.696, -20.924, -31.075, -29.768, -28.906, 4.121, 8.835, 6.191, 3.77, -2.497, 7.408, 18.45, 25.541, 26.878, 14.362, 17.525, 29.856, 36.72, 41.055, 43.544, 49.978, 47.072, 38.901, 36.017, 33.797, 33.867, 38.004, 37.758, 40.367, 34.022, 29.793, 26.701, 31.394, 20.073, 23.809, 16.1, 29.043, 39.557, 27.863, 22.397, 19.053, 17.449, -1.615, -1.989, -9.294, -0.897, -9.818, -8.255, -12.522, -12.931, -21.024, -11.801, -9.048, -9.592, -12.006, -2.632, -1.016, -0.825, 0.914, -2.596, 4.289, 5.917, 12.75, 1.615, -0.053, -8.541, -11.286, -15.181, -14.396, -14.61, -35.473, -44.186, -49.857, -41.286, -39.127, -40.952, -44.388, -42.543, -37.657, -34.048, -28.939, -26.566, -32.876, -38.618, -36.676, -40.893, -35.16, -35.555, -35.175, -33.644, -37.82, -53.217, -49.252, -55.602, -54.32, -57.853, -58.925, -58.098, -56.682, -51.278, -54.353, -46.325, -52.567, -53.636, -52.735, -50.421, -51.122, -52.433, -43.493, -43.142, -29.335, -31.697, -15.13, 3.023)), class = "data.frame", row.names = c(NA, -210L ))
How to convert data captured every 10 min interval into 15 min interval data
I have a dataframe with below data ( Average of the values of timestamp 7.50 and 7.40 should be my value of A for time Stamp 7.45) Date_Time | A 7/28/2017 8:00| 443.75 7/28/2017 7:50| 440.75 7/28/2017 7:45| NA 7/28/2017 7:40| 447.5 7/28/2017 7:30| 448.75 7/28/2017 7:20| 444.5 7/28/2017 7:15| NA 7/28/2017 7:10| 440.25 7/28/2017 7:00| 447.5 I want it to transform into 15 min interval something like below using mean: Date / Time | Object Value 7/28/2017 8:00| 465 7/28/2017 7:45| 464.875 7/28/2017 7:30| 464.75 7/28/2017 7:15| 464.875 7/28/2017 7:00| 465
Updat The OP changes his or her desired output. Since I have no time to update my answer, I will leave my answer as it is. See my comment in the original post to see how to use na.interpolation to fill in the missing values. Original Post This solution assumes you calculated the average based on the average values in 8:00, 7:30, and 7:00. library(dplyr) library(tidyr) library(lubridate) library(imputeTS) dt2 <- dt %>% mutate(Date.Time = mdy_hm(Date.Time)) %>% filter(Date.Time %in% seq(min(Date.Time), max(Date.Time), by = "15 min")) %>% complete(Date.Time = seq(min(Date.Time), max(Date.Time), by = "15 min")) %>% mutate(Object.Value = na.interpolation(Object.Value)) %>% fill(Object.Name) %>% arrange(desc(Date.Time)) dt2 # A tibble: 5 x 3 Date.Time Object.Name Object.Value <dttm> <chr> <dbl> 1 2017-07-28 08:00:00 a 465.000 2 2017-07-28 07:45:00 a 464.875 3 2017-07-28 07:30:00 a 464.750 4 2017-07-28 07:15:00 a 464.875 5 2017-07-28 07:00:00 a 465.000 Data dt <- read.table(text = "'Date Time' 'Object Name' 'Object Value' '7/28/2017 8:00' a 465 '7/28/2017 7:50' a 465 '7/28/2017 7:40' a 464.75 '7/28/2017 7:30' a 464.75 '7/28/2017 7:20' a 464.75 '7/28/2017 7:10' a 465 '7/28/2017 7:00' a 465", header = TRUE, stringsAsFactors = FALSE)
If the values measured on the 10-minute intervals are time-integrated averages over that period, it's reasonable to average them to a different period. If these are instantaneous measurements, then it's more reasonable to smooth them as others have suggested. To take time-integrated averages measured on the 10-minute schedule and average those to the 15-minute schedule, you can use the intervalaverage package: library(data.table) library(intervalaverage) x <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:50", "7/28/2017 7:45", "7/28/2017 7:40", "7/28/2017 7:30", "7/28/2017 7:20", "7/28/2017 7:15", "7/28/2017 7:10", "7/28/2017 7:00"), A = c(443.75, 440.75, NA, 447.5, 448.75, 444.5, NA, 440.25, 447.5)), row.names = c(NA, -9L), class = "data.frame") y <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:45", "7/28/2017 7:30", "7/28/2017 7:15", "7/28/2017 7:00")), row.names = c(NA, -5L), class = "data.frame") setDT(x) setDT(y) x #> time A #> 1: 7/28/2017 8:00 443.75 #> 2: 7/28/2017 7:50 440.75 #> 3: 7/28/2017 7:45 NA #> 4: 7/28/2017 7:40 447.50 #> 5: 7/28/2017 7:30 448.75 #> 6: 7/28/2017 7:20 444.50 #> 7: 7/28/2017 7:15 NA #> 8: 7/28/2017 7:10 440.25 #> 9: 7/28/2017 7:00 447.50 y #> time #> 1: 7/28/2017 8:00 #> 2: 7/28/2017 7:45 #> 3: 7/28/2017 7:30 #> 4: 7/28/2017 7:15 #> 5: 7/28/2017 7:00 x[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")] setnames(x, "time","start_time") x[, start_time_integer:=as.integer(start_time)] y[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")] setnames(y, "time","start_time") y[, start_time_integer:=as.integer(start_time)] setkey(y, start_time) setkey(x, start_time) ##drop time times at 15 and 45 x <- x[!start_time %in% as.POSIXct(c("2017-07-28 07:45:00","2017-07-28 07:15:00"),tz="UTC")] x[, end_time_integer:=as.integer(start_time)+60L*10L-1L] x[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")] y[, end_time_integer:=as.integer(start_time)+60L*15L-1L] y[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")] x #> start_time A start_time_integer end_time_integer #> 1: 2017-07-28 07:00:00 447.50 1501225200 1501225799 #> 2: 2017-07-28 07:10:00 440.25 1501225800 1501226399 #> 3: 2017-07-28 07:20:00 444.50 1501226400 1501226999 #> 4: 2017-07-28 07:30:00 448.75 1501227000 1501227599 #> 5: 2017-07-28 07:40:00 447.50 1501227600 1501228199 #> 6: 2017-07-28 07:50:00 440.75 1501228200 1501228799 #> 7: 2017-07-28 08:00:00 443.75 1501228800 1501229399 #> end_time #> 1: 2017-07-28 07:09:59 #> 2: 2017-07-28 07:19:59 #> 3: 2017-07-28 07:29:59 #> 4: 2017-07-28 07:39:59 #> 5: 2017-07-28 07:49:59 #> 6: 2017-07-28 07:59:59 #> 7: 2017-07-28 08:09:59 y #> start_time start_time_integer end_time_integer end_time #> 1: 2017-07-28 07:00:00 1501225200 1501226099 2017-07-28 07:14:59 #> 2: 2017-07-28 07:15:00 1501226100 1501226999 2017-07-28 07:29:59 #> 3: 2017-07-28 07:30:00 1501227000 1501227899 2017-07-28 07:44:59 #> 4: 2017-07-28 07:45:00 1501227900 1501228799 2017-07-28 07:59:59 #> 5: 2017-07-28 08:00:00 1501228800 1501229699 2017-07-28 08:14:59 out <- intervalaverage(x,y,interval_vars=c("start_time_integer","end_time_integer"),value_vars="A") out[, start_time:=as.POSIXct(start_time_integer,origin="1969-12-31 24:00:00",tz="UTC")] out[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz="UTC")] out[, list(start_time,end_time, A)] #> start_time end_time A #> 1: 2017-07-28 07:00:00 2017-07-28 07:14:59 445.0833 #> 2: 2017-07-28 07:15:00 2017-07-28 07:29:59 443.0833 #> 3: 2017-07-28 07:30:00 2017-07-28 07:44:59 448.3333 #> 4: 2017-07-28 07:45:00 2017-07-28 07:59:59 443.0000 #> 5: 2017-07-28 08:00:00 2017-07-28 08:14:59 NA #Note that this just equivalent to taking weighted.mean: weighted.mean(c(447.5,440.25),w=c(10,5)) #> [1] 445.0833 weighted.mean(c(440.25,444.5),w=c(5,10)) #> [1] 443.0833 #etc Note that the intervalaverage package requires integer columns defining closed intervals, hence the conversion to integer. integers are converted back to datetime (POSIXct) for readability.