need a code for this question "gap filling" time series - r

I have daily time series as provided in the example here, I need to know how to fill the NA value for only the morning time which is starting from 6:00 AM to 9:00 AM, that gap filling it should be by averaging the residual hours of the same day and so on for the other morning day,
set.seed(3)
df <- data.frame( timestamp = seq(as.POSIXct('2022-01-01', tz='utc'),as.POSIXct('2022-01-10 23:00', tz='utc'), by = '1 hour') ,
value = runif(240))
df$value[runif(nrow(df)) < 0.3] <- NA

if I understand you correctly this is one way to solve the task in dplyr:
df %>%
dplyr::mutate(after = ifelse(lubridate::hour(timestamp) > 10, value, NA),
day = format(df$timestamp, format = '%Y-%m-%d')) %>%
dplyr::group_by(day) %>%
dplyr::mutate(value = ifelse(lubridate::hour(timestamp) <10 & is.na(value), mean(after, na.rm = TRUE), value)) %>%
dplyr::ungroup() %>%
dplyr::select(-after, -day)
# A tibble: 240 x 2
timestamp value
<dttm> <dbl>
1 2022-01-01 00:00:00 0.427
2 2022-01-01 01:00:00 0.808
3 2022-01-01 02:00:00 0.385
4 2022-01-01 03:00:00 0.427
5 2022-01-01 04:00:00 0.602
6 2022-01-01 05:00:00 0.604
7 2022-01-01 06:00:00 0.125
8 2022-01-01 07:00:00 0.295
9 2022-01-01 08:00:00 0.578
10 2022-01-01 09:00:00 0.631
# ... with 230 more rows
# i Use `print(n = ...)` to see more rows

timestamp value after day
1 2022-01-01 00:00:00 NaN NA 00
2 2022-01-01 01:00:00 0.808 NA 01
3 2022-01-01 02:00:00 0.385 NA 02
4 2022-01-01 03:00:00 NaN NA 03
5 2022-01-01 04:00:00 0.602 NA 04
6 2022-01-01 05:00:00 0.604 NA 05
7 2022-01-01 06:00:00 0.125 NA 06
8 2022-01-01 07:00:00 0.295 NA 07
9 2022-01-01 08:00:00 0.578 NA 08
10 2022-01-01 09:00:00 0.631 NA 09
... with 230 more rows
i Use print(n = ...) to see more rows

Related

Match two tables based on a time difference criterium

I have a data table (lv_timest) with time stamps every 3 hours for each date:
# A tibble: 6 × 5
LV0_mean LV1_mean LV2_mean Date_time Date
<dbl> <dbl> <dbl> <S3:POSIXct> <date>
1 0.778 -4.12 0.736 2016-12-28 00:00:00 2016-12-28
2 0.376 -0.234 0.388 2016-12-28 03:00:00 2016-12-28
3 0.409 1.46 0.241 2016-12-28 06:00:00 2016-12-28
4 0.760 2.07 0.460 2016-12-28 09:00:00 2016-12-28
5 0.759 2.91 0.735 2016-12-28 12:00:00 2016-12-28
6 0.857 3.00 0.803 2016-12-28 15:00:00 2016-12-28
from which I would like to extract the time stamps that match as closely as possible those of another table (event_timest):
# A tibble: 6 × 4
Event_number Date_time Date Date_time_new
<int> <S3: POSIXct> <date> <S3: POSIXct>
1 75 2016-12-28 08:00:00 2016-12-28 2016-12-28 08:00:00
2 123 2016-12-30 14:02:00 2016-12-30 2016-12-30 14:00:00
3 264 2017-01-07 06:12:00 2017-01-07 2017-01-07 06:00:00
4 317 2017-01-09 10:59:00 2017-01-09 2017-01-09 11:00:00
5 318 2017-01-09 13:31:00 2017-01-09 2017-01-09 14:00:00
6 369 2017-01-11 07:24:00 2017-01-11 2017-01-11 07:00:00
For example, for row 1 in table event_timest, I would extract row 4 from table lv_timest:
Event_number Date_time.x Date.x Date_time_new LV0_mean LV1_mean LV2_mean Date_time.y Date.y
<int> <S3: POSIXct> <date> <S3: POSIXct> <dbl> <dbl> <dbl> <S3: POSIXct> <date>
75 2016-12-28 08:00:00 2016-12-28 2016-12-28 08:00:00 0.760 2.07 0.460 2016-12-28 09:00:00 2016-12-28
In fact, the time difference should not be over one hour. I thought of using the fuzzyjoin package for this, and writing a function that computes the time difference between timestamps of the two table, as hours. However, fuzzy_inner_join replicates rows in the second table and takes several timestamps in the first table to match it.
require(lubridate)
require(fuzzyjoin)
diff_timest <- function(x, y){abs(x%--%y %/% hours(1)) <= 1} # time interval as hours ≤ 1 hour
match_timest <- fuzzy_inner_join(event_timest, lv_timest,
by = c("Date" = "Date",
"Date_time_new" = "Date_time"),
match_fun = list(`==`, diff_timest))
head(match_timest)
# A tibble: 6 × 9
Event_number Date_time.x Date.x Date_time_new LV0_mean LV1_mean LV2_mean Date_time.y Date.y
<int> <dttm> <date> <dttm> <dbl> <dbl> <dbl> <dttm> <date>
1 75 2016-12-28 08:00:00 2016-12-28 2016-12-28 08:00:00 0.760 2.07 0.460 2016-12-28 09:00:00 2016-12-28
2 123 2016-12-30 14:02:00 2016-12-30 2016-12-30 14:00:00 1.24 1.83 2.05 2016-12-30 15:00:00 2016-12-30
3 264 2017-01-07 06:12:00 2017-01-07 2017-01-07 06:00:00 -0.128 -5.43 2.72 2017-01-07 06:00:00 2017-01-07
4 317 2017-01-09 10:59:00 2017-01-09 2017-01-09 11:00:00 -0.0751 0.171 2.56 2017-01-09 09:00:00 2017-01-09
5 317 2017-01-09 10:59:00 2017-01-09 2017-01-09 11:00:00 -0.204 -0.797 2.28 2017-01-09 12:00:00 2017-01-09
6 318 2017-01-09 13:31:00 2017-01-09 2017-01-09 14:00:00 -0.204 -0.797 2.28 2017-01-09 12:00:00 2017-01-09
Would there be another way to do this?
Joining is always a procedure of first getting all combinations of all rows followed by a filter. We can do this manually:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
datetimes_a <- tibble(
id = seq(3),
group = "A",
datetime = c("2016-12-28 00:00:00", "2016-12-28 03:00:00", "2016-12-28 23:59:59") %>% as.POSIXct()
)
datetimes_b <- tibble(
id = seq(3),
group = "B",
datetime = c("2016-12-28 00:00:10", "2016-12-28 03:20:00", "2016-12-29 00:00:02") %>% as.POSIXct()
)
datetimes_a %>%
# start with cross product of all possible pairs
expand_grid(datetimes_b %>% rename_all(~ paste0(.x, "_b"))) %>%
mutate(diff = abs(datetime - datetime_b)) %>%
# get shortest time difference
group_by(id, id_b) %>%
arrange(diff) %>%
slice(1) %>%
# time diff must be less than 1hr
filter(diff < hours(1))
#> # A tibble: 3 x 7
#> # Groups: id, id_b [3]
#> id group datetime id_b group_b datetime_b diff
#> <int> <chr> <dttm> <int> <chr> <dttm> <drtn>
#> 1 1 A 2016-12-28 00:00:00 1 B 2016-12-28 00:00:10 10 secs
#> 2 2 A 2016-12-28 03:00:00 2 B 2016-12-28 03:20:00 1200 secs
#> 3 3 A 2016-12-28 23:59:59 3 B 2016-12-29 00:00:02 3 secs
Created on 2022-02-08 by the reprex package (v2.0.1)
This works also if the nearest timepoint is on another date e.g. right before and after midnight.
I would suggest a standard join, followed by a grouped filter to the closest instance of each timestamp:
library(tidyverse)
library(lubridate)
match_timest <- event_timest %>%
inner_join(lv_timest, by = "Date") %>%
mutate(diff = abs(as.numeric(Date_time.x - Date_time.y, unit = "hours"))) %>%
group_by(Date_time.y) %>%
filter(diff <= 1 & diff == min(diff)) %>%
ungroup() %>%
select(!diff)
Note:
this will still match multiple rows if there’s more than one that are the exact same shortest difference from the index timestamp.
this won’t match timestamps from different dates — eg, 23:59:59 on 1/1/22 won’t be matched with 00:00:00 on 1/2/22. If you’d like to do that, you can use a full Cartesian join (full_join(lv_timest, by = character())) rather than the inner_join() above.

Calculate maximum values for three consecutive days of tibble in R

I am doing time series analysis. Part of my data is as follow:
# A tibble: 6 x 3
time DOY Value
<dttm> <dbl> <dbl>
1 2015-01-08 12:30:00 8 0.664
2 2015-01-08 13:00:00 8 0.647
3 2015-01-11 14:00:00 11 0.669
4 2015-01-11 15:00:00 11 0.644
5 2015-02-04 12:30:00 35 0.664
6 2015-02-04 13:00:00 35 0.647
I would like to calculate the maximum values of 7 consecutive days of the data. For example:
# A tibble: 6 x 4
time DOY Value Max
<dttm> <dbl> <dbl> <dbl>
1 2015-01-08 12:30:00 8 0.664 11.669
2 2015-01-08 13:00:00 8 0.647 11.669
3 2015-01-11 14:00:00 11 0.669 11.669
4 2015-01-11 15:00:00 11 0.644 11.669
5 2015-02-04 12:30:00 35 0.664 35.664
6 2015-02-04 13:00:00 35 0.647 35.664
welcome to R and Stackoverflow. As mentioned above, you will find many friends here, if you provide a reproducible example, and explain what you have done and/or where things go wrong for you. This helps others to help you.
Based on your data fragment, I do some basic operations that I think might help you. Still you may need to adapt the principles to your problem case.
data
I turned your example into a tibble. Please note, when you work with date, times, date-times I recommend you use the respective variable type. This will give you access to helpful functions, etc.
Please also note you mentioned 8*3 tibble above. In fact your data structure is already a 4 column tibble with Date, time, DOY, and value!
library(dplyr) # basic dataframe/tibble operations
library(lubridate) # for datetime handling
df <- tribble(
~Date, ~time, ~DOY, ~Value
,"2015-01-08", "12:30:00", 8, 0.664
,"2015-01-08", "13:00:00", 8, 0.647
,"2015-01-11", "14:00:00", 11, 0.669
,"2015-01-11", "15:00:00", 11, 0.644
,"2015-02-04", "12:30:00", 35, 0.664
,"2015-02-04", "13:00:00", 35, 0.647
)
df <- df %>%
mutate(timestamp = ymd_hms(paste(Date, time))
This yields:
df
# A tibble: 6 x 5
Date time DOY Value timestamp
<chr> <chr> <dbl> <dbl> <dttm>
1 2015-01-08 12:30:00 8 0.664 2015-01-08 12:30:00
2 2015-01-08 13:00:00 8 0.647 2015-01-08 13:00:00
3 2015-01-11 14:00:00 11 0.669 2015-01-11 14:00:00
4 2015-01-11 15:00:00 11 0.644 2015-01-11 15:00:00
5 2015-02-04 12:30:00 35 0.664 2015-02-04 12:30:00
6 2015-02-04 13:00:00 35 0.647 2015-02-04 13:00:00
Note: timestamp is now a datetime objet dttm.
binning of data
It is not fully clear what your consecutive 7 days are and/or how you "group" them.
I assume you want to pick 7 days of a week.
As datetime is dttm, we can use the power of {lubridate} and extract the week from the datetime.
Note: you may want to bin/group your data differently. Think about what you want to achieve here and adpat this accordingly.
df <- df %>% mutate(bin = week(timestamp))
df
# A tibble: 6 x 6
Date time DOY Value timestamp bin
<chr> <chr> <dbl> <dbl> <dttm> <dbl>
1 2015-01-08 12:30:00 8 0.664 2015-01-08 12:30:00 2
2 2015-01-08 13:00:00 8 0.647 2015-01-08 13:00:00 2
3 2015-01-11 14:00:00 11 0.669 2015-01-11 14:00:00 2
4 2015-01-11 15:00:00 11 0.644 2015-01-11 15:00:00 2
5 2015-02-04 12:30:00 35 0.664 2015-02-04 12:30:00 5
6 2015-02-04 13:00:00 35 0.647 2015-02-04 13:00:00 5
If you want to work on "7 consecutive days" you will need to identify the groups of 7 days. Again, there are different ways to do this, check what the modulo operator does and how to apply this to your DOY.
operating on your groups/bins
You describe looking for the maximum per bin (7 days ~ week).
{dplyr} offers for such problems grouped operations. Read up on them:
df %>%
group_by(bin) %>%
summarise(MaxValue = max(Value) # we create a new variable and assing the max of each group to it
)
# A tibble: 2 x 2
bin MavValue
<dbl> <dbl>
1 2 0.669
2 5 0.664
Obviously, you can perform many operations (summaries of your bins/groups).
Note: You can create bins on multiple variables. Read up on group_by() and summarise(..., .groups = "drop"), if you want to use this interim tibble for further calculations.
Hope this gets you started.
clarification on grouping by 7 days
If you have a sequence of (integer) numbers, there is a neat way to group this into n-element bins, i.e. using integer division.
In your case the data comes already with a date-of-year DOY variable. For completeness: with lubridate you can pull the DOY from a timestamp with the function yday(), i.e. (df %>% mutate(DOY = yday(timestamp)).
# let's use integer division to group our DOYs into group of 7s
##--------- does not look at date or day
##--------- group 1-7 := 0, group 8-14 := 1, .... group 29-35 := 5
df <- df %>%
mutate(bin = DOY %/% 7)
This yields:
# A tibble: 6 x 6
Date time DOY Value timestamp bin
<chr> <chr> <dbl> <dbl> <dttm> <dbl>
1 2015-01-08 12:30:00 8 0.664 2015-01-08 12:30:00 1
2 2015-01-08 13:00:00 8 0.647 2015-01-08 13:00:00 1
3 2015-01-11 14:00:00 11 0.669 2015-01-11 14:00:00 1
4 2015-01-11 15:00:00 11 0.644 2015-01-11 15:00:00 1
5 2015-02-04 12:30:00 35 0.664 2015-02-04 12:30:00 5
6 2015-02-04 13:00:00 35 0.647 2015-02-04 13:00:00 5
And then build your max summary as before on the (new) grouping variable:
df %>%
group_by(bin) %>%
summarise(MaxValue = max(Value)
# A tibble: 2 x 2
bin MaxValue
<dbl> <dbl>
1 1 0.669
2 5 0.664
For the example data given the result is identical. However, with your full dataset and the offset between "weeks" (with their defined start date) vs cutting your DOYs into bins of 7 consecutive days, you will get a different summary (unless, the first day of the week (*) coincides with DOY 1).
(*): in lubridate you can set weeks to start Monday or Sunday as a parameter (in case you ever need this).

Mean over a certain time-period in R

I have hourly data of CO2 values and I would like to know what is the CO2 concentration during the night (e.g. 9pm-7am). A reproducible example:
library(tidyverse); library(lubridate)
times <- seq(ymd_hms("2020-01-01 08:00:00"),
ymd_hms("2020-01-04 08:00:00"), by = "1 hours")
values <- runif(length(times), 1, 15)
df <- tibble(times, values)
How to get mean nightime values (e.g. between 9pm and 7am)? Of course I can filter like this:
df <- df %>%
filter(!hour(times) %in% c(8:20))
And then give id to each observation during the night
df$ID <- rep(LETTERS[1:round(nrow(df)/11)],
times = 1, each = 11)
And finally group and summarise
df_grouped <- df %>%
group_by(., ID) %>%
summarise(value_mean =mean(values))
But this is not a good way I am sure. How to do this better? Especially the part where we give ID to the nighttime values
You can use data.table::frollmean to get the means for a certain window time. In your case you want the means for the last 10 hours, so we set the n argument of the function to 10:
> df$means <- data.table::frollmean(df$values, 10)
> df
> head(df, 20)
# A tibble: 20 x 3
times values means
<dttm> <dbl> <dbl>
1 2020-01-01 08:00:00 4.15 NA
2 2020-01-01 09:00:00 6.24 NA
3 2020-01-01 10:00:00 5.17 NA
4 2020-01-01 11:00:00 9.20 NA
5 2020-01-01 12:00:00 12.3 NA
6 2020-01-01 13:00:00 2.93 NA
7 2020-01-01 14:00:00 9.12 NA
8 2020-01-01 15:00:00 9.72 NA
9 2020-01-01 16:00:00 12.0 NA
10 2020-01-01 17:00:00 13.4 8.41
11 2020-01-01 18:00:00 10.2 9.01
12 2020-01-01 19:00:00 1.97 8.59
13 2020-01-01 20:00:00 11.9 9.26
14 2020-01-01 21:00:00 8.84 9.23
15 2020-01-01 22:00:00 10.1 9.01
16 2020-01-01 23:00:00 3.76 9.09
17 2020-01-02 00:00:00 9.98 9.18
18 2020-01-02 01:00:00 5.56 8.76
19 2020-01-02 02:00:00 5.22 8.09
20 2020-01-02 03:00:00 6.36 7.39
Each row in the mean column will be the mean of that same row value column with the 9 last rows of the value column. Of course there will be some NAs.
Maybe you should give some look to the tsibble package, built to manipulate time series.
You can parametrize the difference between the times you want, but they need to be evenly spaced in your data to use this solution:
n <- diff(which(grepl('20:00:00|08:00:00', df$times))) + 1
n <- unique(n)
df$means <- data.table::frollmean(df$values, n)
> head(df, 20)
# A tibble: 20 x 3
times values means
<dttm> <dbl> <dbl>
1 2020-01-01 08:00:00 11.4 NA
2 2020-01-01 09:00:00 7.03 NA
3 2020-01-01 10:00:00 7.15 NA
4 2020-01-01 11:00:00 6.91 NA
5 2020-01-01 12:00:00 8.18 NA
6 2020-01-01 13:00:00 4.70 NA
7 2020-01-01 14:00:00 13.8 NA
8 2020-01-01 15:00:00 5.16 NA
9 2020-01-01 16:00:00 12.3 NA
10 2020-01-01 17:00:00 3.81 NA
11 2020-01-01 18:00:00 3.09 NA
12 2020-01-01 19:00:00 9.89 NA
13 2020-01-01 20:00:00 1.24 7.28
14 2020-01-01 21:00:00 8.07 7.02
15 2020-01-01 22:00:00 5.59 6.91
16 2020-01-01 23:00:00 5.77 6.81
17 2020-01-02 00:00:00 10.7 7.10
18 2020-01-02 01:00:00 3.44 6.73
19 2020-01-02 02:00:00 10.3 7.16
20 2020-01-02 03:00:00 4.61 6.45

Some conditions in nested ifelse taken into account

I struggle with nested ifelse. I want to create a new variable using dplyr::mutate based on values of other variables. See the reproductible example below.
library(dplyr)
library(hms)
# make a test dataframe
datetime <- as.POSIXct(c("2015-01-26 10:10:00 UTC","2015-01-26 10:20:00 UTC","2015-01-26 10:30:00 UTC", "2015-01-26 10:40:00 UTC","2015-01-26 10:50:00 UTC","2015-01-26 11:00:00 UTC","2015-01-26 00:10:00 UTC","2015-01-26 11:20:00 UTC","2015-01-26 11:30:00 UTC","2017-03-10 10:00:00 UTC"))
time <- hms::as_hms(datetime)
pco2_corr <- c(90,135,181,226,272,317,363,NA,454,300)
State_Zero <- c(NA,NA,1,rep(NA,7))
State_Flush <- c(rep(NA,4),1,rep(NA,5))
z <- tibble(datetime, time, pco2_corr, State_Zero, State_Flush)
# now create a new variable
z <- z %>%
dplyr::mutate(pco2_corr_qf = ifelse(is.na(pco2_corr), 15,
ifelse((State_Zero >= 1 | State_Flush >= 1), 4,
ifelse(pco2_corr < 100 | pco2_corr > 450, 7,
ifelse((time >= "00:00:00" & time <= "01:30:00") |
(time >= "12:00:00" & time <= "13:00:00"), 16,
ifelse((datetime >= "2017-03-10 08:00:00" &
datetime < "2017-03-21 20:00:00"), 99,
1))))))
z
# A tibble: 10 x 6
datetime time pco2_corr State_Zero State_Flush pco2_corr_qf
<dttm> <time> <dbl> <dbl> <dbl> <dbl>
1 2015-01-26 10:10:00 10:10 90 NA NA NA
2 2015-01-26 10:20:00 10:20 135 NA NA NA
3 2015-01-26 10:30:00 10:30 181 1 NA 4
4 2015-01-26 10:40:00 10:40 226 NA NA NA
5 2015-01-26 10:50:00 10:50 272 NA 1 4
6 2015-01-26 11:00:00 11:00 317 NA NA NA
7 2015-01-26 00:10:00 00:10 363 NA NA NA
8 2015-01-26 11:20:00 11:20 NA NA NA 15
9 2015-01-26 11:30:00 11:30 454 NA NA NA
10 2017-03-10 10:00:00 10:00 300 NA NA NA
The first two ifelse work fine but the next three do not. The new variable pco2_corr_qf should not have any NA but values 7, 16, 99 and 1.
What am I doing wrong?
You are comparing time with a string that gives incorrect output, convert it to the relevant class. We can use case_when which is a better alternative to nested ifelse.
library(dplyr)
library(hms)
z %>%
mutate(pco2_corr_qf = case_when(
is.na(pco2_corr) ~ 15,
State_Zero >= 1 | State_Flush >= 1 ~ 4,
pco2_corr < 100 | pco2_corr > 450 ~ 7,
(time >= as_hms("00:00:00") & time <= as_hms("01:30:00")) |
(time >= as_hms("12:00:00") & time <= as_hms("13:00:00")) ~ 16,
datetime >= as.POSIXct("2017-03-10 08:00:00") &
datetime < as.POSIXct("2017-03-21 20:00:00") ~ 99,
TRUE ~ 1))
# datetime time pco2_corr State_Zero State_Flush pco2_corr_qf
# <dttm> <time> <dbl> <dbl> <dbl> <dbl>
# 1 2015-01-26 10:10:00 10:10 90 NA NA 7
# 2 2015-01-26 10:20:00 10:20 135 NA NA 1
# 3 2015-01-26 10:30:00 10:30 181 1 NA 4
# 4 2015-01-26 10:40:00 10:40 226 NA NA 1
# 5 2015-01-26 10:50:00 10:50 272 NA 1 4
# 6 2015-01-26 11:00:00 11:00 317 NA NA 1
# 7 2015-01-26 00:10:00 00:10 363 NA NA 16
# 8 2015-01-26 11:20:00 11:20 NA NA NA 15
# 9 2015-01-26 11:30:00 11:30 454 NA NA 7
#10 2017-03-10 10:00:00 10:00 300 NA NA 99

Daily select one hour of the day for analysis

Have dataset of 1 year
hourly records
for analysis, I need to extract seperately for each month of the year, each hour of the day , so january 00h, january 01h, january 02h, january 03h, ...., ... , march 21h, march 22h, march 23h
Thanks in advance for any useful help!
Select observations for specified hours of the day during a period with datetime, filter, subset, ...
Code below (filter, month (time) generates fatal errror
Error: unexpected ')' in "at<-subset(groenenborgerno, timestamp=hour(time) == 01))"
groenenborgerno$timestamp <- as.POSIXct(groenenborgerno$date, format="%Y-%m-%d %H:%M:%S")
library(lubridate)
january01<-filter(atimeframe,
(month(time) == 01 & hour(time) == 01) )
Since no data is provided, I will try to answer your question with sample data:
require(lubridate)
require(tidyverse)
## Create some sample data:
time_index <- seq(from = as.POSIXct("2017-01-01 07:00"),
to = as.POSIXct("2018-01-01 18:00"), by = "hour")
value <- rnorm(n = length(time_index))
data <- data.frame(time_index,value)
data <- data %>% mutate (hour = hour(time_index),
month = month(time_index)) %>%
group_by(month,hour)
head(data)
> data
# A tibble: 8,772 x 4
# Groups: month, hour [288]
time_index value hour month
<dttm> <dbl> <int> <dbl>
1 2017-01-01 07:00:00 -0.626 7 1
2 2017-01-01 08:00:00 0.184 8 1
3 2017-01-01 09:00:00 -0.836 9 1
4 2017-01-01 10:00:00 1.60 10 1
5 2017-01-01 11:00:00 0.330 11 1
6 2017-01-01 12:00:00 -0.820 12 1
7 2017-01-01 13:00:00 0.487 13 1
8 2017-01-01 14:00:00 0.738 14 1
9 2017-01-01 15:00:00 0.576 15 1
10 2017-01-01 16:00:00 -0.305 16 1
# ... with 8,762 more rows
and then just filter() the hour / month combination you would want like so:
data %>% filter(hour > 12 & month == 1)
# A tibble: 347 x 4
# Groups: month, hour [11]
time_index value hour month
<dttm> <dbl> <int> <dbl>
1 2017-01-01 13:00:00 0.487 13 1
2 2017-01-01 14:00:00 0.738 14 1
3 2017-01-01 15:00:00 0.576 15 1
4 2017-01-01 16:00:00 -0.305 16 1
5 2017-01-01 17:00:00 1.51 17 1
6 2017-01-01 18:00:00 0.390 18 1
7 2017-01-01 19:00:00 -0.621 19 1
8 2017-01-01 20:00:00 -2.21 20 1
9 2017-01-01 21:00:00 1.12 21 1
10 2017-01-01 22:00:00 -0.0449 22 1
# ... with 337 more rows

Resources