Using lubridate function to conditionally change year in date - r

So I am currently struggling to change the year part of a date using an if function.
My date column looks like this (there are many more values in the real dataframe so it can not be a manual solution):
Date
<S3: POSIXct>
2020-10-31
2020-12-31
I essentially want to create an if command (or any other method) such that if the month is less than 12 (ie. not December) then the year is increased by 1.
So far I have tried using the lubridate package to do:
if (month(df$Date)<12) {
df$Date <- df$Date %m+% years(1)
}
However the function seems to not be conditioning correctly and is instead adding one to every year including those where the month=12
Any help would be greatly appreciated!

You could approach it like this:
library(lubridate)
dates <- seq.Date(from = as.Date('2015-01-01'),
to = Sys.Date(),
by = 'months')
indicator <- month(dates) < 12
dates2 <- dates
dates2[indicator] <- dates[indicator] %m+% years(1)
res <- data.frame(dates, indicator, dates2)
> head(res, 25)
dates indicator dates2
1 2015-01-01 TRUE 2016-01-01
2 2015-02-01 TRUE 2016-02-01
3 2015-03-01 TRUE 2016-03-01
4 2015-04-01 TRUE 2016-04-01
5 2015-05-01 TRUE 2016-05-01
6 2015-06-01 TRUE 2016-06-01
7 2015-07-01 TRUE 2016-07-01
8 2015-08-01 TRUE 2016-08-01
9 2015-09-01 TRUE 2016-09-01
10 2015-10-01 TRUE 2016-10-01
11 2015-11-01 TRUE 2016-11-01
12 2015-12-01 FALSE 2015-12-01
13 2016-01-01 TRUE 2017-01-01
14 2016-02-01 TRUE 2017-02-01
15 2016-03-01 TRUE 2017-03-01
16 2016-04-01 TRUE 2017-04-01
17 2016-05-01 TRUE 2017-05-01
18 2016-06-01 TRUE 2017-06-01
19 2016-07-01 TRUE 2017-07-01
20 2016-08-01 TRUE 2017-08-01
21 2016-09-01 TRUE 2017-09-01
22 2016-10-01 TRUE 2017-10-01
23 2016-11-01 TRUE 2017-11-01
24 2016-12-01 FALSE 2016-12-01
25 2017-01-01 TRUE 2018-01-01

Found a more direct solution which doesnt use if commands.
df$Date[month(df$Date)<12] <- df$Date[month(df$Date)<12] %m+% years(1)
This still uses the month function in the lubridate package to get the conditions

Related

R: how can I split one row of a time period into multiple rows based on day and time

I am trying to split rows in an excel file based on day and time. The data is from a study which participants will need to wear a tracking watch. Each row of the data set is started with participants put on the watch (Variable: 'Wear Time Start ') and ended with them taking off the device (Variable: 'Wear Time End').
I need to calculate how many hours of each participant wearing the device on each day (NOT each time period in one row).
Data set before split:
ID WearStart WearEnd
1 01 2018-05-14 09:00:00 2018-05-14 20:00:00
2 01 2018-05-14 21:30:00 2018-05-15 02:00:00
3 01 2018-05-15 07:00:00 2018-05-16 22:30:00
4 01 2018-05-16 23:00:00 2018-05-16 23:40:00
5 01 2018-05-17 01:00:00 2018-05-19 15:00:00
6 02 ...
Some explanation about the data set before split: the data type of 'WearStart' and 'WearEnd' are POSIXlt.
Desired output after split:
ID WearStart WearEnd Interval
1 01 2018-05-14 09:00:00 2018-05-14 20:00:00 11
2 01 2018-05-14 21:30:00 2018-05-15 00:00:00 2.5
3 01 2018-05-15 00:00:00 2018-05-15 02:00:00 2
4 01 2018-05-15 07:00:00 2018-05-16 00:00:00 17
5 01 2018-05-16 00:00:00 2018-05-16 22:30:00 22.5
4 01 2018-05-16 23:00:00 2018-05-16 23:40:00 0.4
5 01 2018-05-17 01:00:00 2018-05-18 00:00:00 23
6 01 2018-05-18 00:00:00 2018-05-19 00:00:00 24
7 01 2018-05-19 00:00:00 2018-05-19 15:00:00 15
Then I need to accumulate hours based on day:
ID Wear_Day Total_Hours
1 01 2018-05-14 13.5
2 01 2018-05-15 19
3 01 2018-05-16 22.9
4 01 2018-05-17 23
5 01 2018-05-18 24
4 01 2018-05-19 15
So, I reworked the entire answer. Please, review the code. I am pretty sure this is what you want.
Short summary
The problem is that you need to split rows which start and end on different dates. And you need to do this recursively. So, I split the dataframe into a list of 1-row dataframes. For each I check whether start and end is on the same day. If not, I make it a 2-row dataframe with the adjusted start and end times. This is then split up again into a list of 1-row dataframes and so on so forth.
In the end there is a nested list of 1-row dataframes where start and end is on the same day. And this list is then recursively bound together again.
# Load Packages ---------------------------------------------------------------------------------------------------
library(tidyverse)
library(lubridate)
df <- tribble(
~ID, ~WearStart, ~WearEnd
, 01, "2018-05-14 09:00:00", "2018-05-14 20:00:00"
, 01, "2018-05-14 21:30:00", "2018-05-15 02:00:00"
, 01, "2018-05-15 07:00:00", "2018-05-16 22:30:00"
, 01, "2018-05-16 23:00:00", "2018-05-16 23:40:00"
, 01, "2018-05-17 01:00:00", "2018-05-19 15:00:00"
)
df <- df %>% mutate_at(vars(starts_with("Wear")), ymd_hms)
# Helper Functions ------------------------------------------------------------------------------------------------
endsOnOtherDay <- function(df){
as_date(df$WearStart) != as_date(df$WearEnd)
}
split1rowInto2Days <- function(df){
df1 <- df
df2 <- df
df1$WearEnd <- as_date(df1$WearStart) + days(1) - milliseconds(1)
df2$WearStart <- as_date(df2$WearStart) + days(1)
rbind(df1, df2)
}
splitDates <- function(df){
if (nrow(df) > 1){
return(df %>%
split(f = 1:nrow(df)) %>%
lapply(splitDates) %>%
reduce(rbind))
}
if (df %>% endsOnOtherDay()){
return(df %>%
split1rowInto2Days() %>%
splitDates())
}
df
}
# The actual Calculation ------------------------------------------------------------------------------------------
df %>%
splitDates() %>%
mutate(wearDuration = difftime(WearEnd, WearStart, units = "hours")
, wearDay = as_date(WearStart)) %>%
group_by(ID, wearDay) %>%
summarise(wearDuration_perDay = sum(wearDuration))
ID wearDay wearDuration_perDay
<dbl> <date> <drtn>
1 1 2018-05-14 13.50000 hours
2 1 2018-05-15 19.00000 hours
3 1 2018-05-16 23.16667 hours
4 1 2018-05-17 23.00000 hours
5 1 2018-05-18 24.00000 hours
6 1 2018-05-19 15.00000 hours
Here is my solution to your question with just using basic functions in R:
#step 1: read data from file
d <- read.csv("dt.csv", header = TRUE)
d
ID WearStart WearEnd
1 1 2018-05-14 09:00:00 2018-05-14 20:00:00
2 1 2018-05-14 21:30:00 2018-05-15 02:00:00
3 1 2018-05-15 07:00:00 2018-05-16 22:30:00
4 1 2018-05-16 23:00:00 2018-05-16 23:40:00
5 1 2018-05-17 01:00:00 2018-05-19 15:00:00
6 2 2018-05-16 11:30:00 2018-05-16 11:40:00
7 2 2018-05-16 22:05:00 2018-05-22 22:42:00
#step 2: change class of WearStart and WearEnd to POSIlct
d$WearStart <- as.POSIXlt(d$WearStart, tryFormats = "%Y-%m-%d %H:%M")
d$WearEnd <- as.POSIXlt(d$WearEnd, tryFormats = "%Y-%m-%d %H:%M")
#step 3: calculate time interval (days and hours) for each record
timeInt <- function(d) {
WearStartDay <- as.Date(d$WearStart, "%Y/%m/%d")
Interval_days <- as.numeric(difftime(d$WearEnd,d$WearStart, units = "days"))
Days <- WearStartDay + seq(0, Interval_days,1)
N_FullBTWDays <- length(Days) - 2
if (N_FullBTWDays >= 0) {
sd <- d$WearStart
sd_h <- 24 - sd$hour -1
sd_m <- (60 - sd$min)/60
sd_total <- sd_h + sd_m
hours <- sd_total
hours <- c(hours, rep(24,N_FullBTWDays))
ed <- d$WearEnd
ed_h <- ed$hour
ed_m <- ed$min/60
ed_total <- ed_h + ed_m
hours <- c(hours,ed_total)
} else {
hours <- as.numeric(difftime(d$WearEnd,d$WearStart, units = "hours"))
}
df <- data.frame(id = rep(d$ID, length(Days)), days = Days, hours = hours)
return(df)
}
df <- data.frame(matrix(ncol = 3, nrow = 0))
colnames(df) <- c("id", "days", "hours")
for ( i in 1:nrow(d)) {
df <- rbind(df,timeInt(d[i,]))
}
id days hours
1 1 2018-05-14 11.0000000
2 1 2018-05-14 4.5000000
3 1 2018-05-15 17.0000000
4 1 2018-05-16 22.5000000
5 1 2018-05-16 0.6666667
6 1 2018-05-17 23.0000000
7 1 2018-05-18 24.0000000
8 1 2018-05-19 15.0000000
9 2 2018-05-16 0.1666667
10 2 2018-05-16 1.9166667
11 2 2018-05-17 24.0000000
12 2 2018-05-18 24.0000000
13 2 2018-05-19 24.0000000
14 2 2018-05-20 24.0000000
15 2 2018-05-21 24.0000000
16 2 2018-05-22 22.7000000
#daily usage of device for each customer
res <- as.data.frame(tapply(df$hours, list(df$days,df$id), sum))
res[is.na(res)] <- 0
res$date <- rownames(res)
res
1 2 date
2018-05-14 15.50000 0.000000 2018-05-14
2018-05-15 17.00000 0.000000 2018-05-15
2018-05-16 23.16667 2.083333 2018-05-16
2018-05-17 23.00000 24.000000 2018-05-17
2018-05-18 24.00000 24.000000 2018-05-18
2018-05-19 15.00000 24.000000 2018-05-19
2018-05-20 0.00000 24.000000 2018-05-20
2018-05-21 0.00000 24.000000 2018-05-21
2018-05-22 0.00000 22.700000 2018-05-22

R : data.table returns an empty row when the it actually exist

I have data that contains dates.
newdata <- data.table(example.dates)
> newdata
start_date paid_date
1: 2014-08-01 2015-09-24
2: 2015-08-01 2015-10-22
3: 2015-10-01 2015-12-45
4: 2015-11-01 2016-03-23
5: 2016-12-01 2017-02-06
---
100: 2018-02-05 2018-04-28
101: 2018-03-02 2018-07-18
102: 2018-06-14 2018-10-13
103: 2018-08-16 2018-11-04
104: 2018-10-19 2018-11-22
I have a function that calculates difference between dates in months
difference_month <- function(new_date, old_date) {
start_date <- old_date %>% as.Date() %>% as.yearmon()
end_date <- new_date %>% as.Date() %>% as.yearmon()
diff_mon <- (end_date - start_date) * 12
return(diff_mon)
}
and added 'diff' column in newdata table.
newdata[,diff := difference_month(paid_date,start_date)]
> newdata
start_date paid_date diff
1: 2014-08-01 2015-09-24 13
2: 2015-08-01 2015-10-22 2
3: 2015-10-01 2015-12-45 2
4: 2015-11-01 2016-03-23 4
5: 2016-12-01 2017-02-06 2
---
100: 2018-02-05 2018-04-28 2
101: 2018-03-02 2018-07-18 4
102: 2018-06-14 2018-10-13 4
103: 2018-08-16 2018-11-04 3
104: 2018-10-19 2018-11-22 1
But, this appears when I want to see the rows that has 2 months difference.
> newdata[diff == 2]
Empty data.table (0 rows) of 3 cols: start_date,paid_date,diff
However, it works when I select a row that contains 2 months difference and use that to find the whole rows that contain 2 months difference.
x <- newdata[2][[3]]
> newdata[diff == x]
start_date paid_date diff
1: 2015-08-01 2015-10-22 2
2: 2015-10-01 2015-12-45 2
3: 2016-12-01 2017-02-06 2
4: 2018-02-05 2018-04-28 2
I checked str() and 'diff' was in numeric.
Why does this return empty rows when 2 months difference actually exist?
newdata[diff == 2]
Using newdata from the Note at the end, we note that floating calculations can produce rounding error so round it at the end as shown below. Also note that as.yearmon can convert the date columns directly so as.Date is not needed.
library(data.table)
library(zoo)
newdata[, diff := round(12 * (as.yearmon(paid_date) - as.yearmon(start_date)))]
newdata[diff == 2]
giving:
start_date paid_date diff
1: 2015-08-01 2015-10-22 2
2: 2016-12-01 2017-02-06 2
3: 2018-02-05 2018-04-28 2
Note
The input in reproducible form:
Lines <- "
start_date paid_date
2014-08-01 2015-09-24
2015-08-01 2015-10-22
2015-10-01 2015-12-45
2015-11-01 2016-03-23
2016-12-01 2017-02-06
2018-02-05 2018-04-28
2018-03-02 2018-07-18
2018-06-14 2018-10-13
2018-08-16 2018-11-04
2018-10-19 2018-11-22"
library(data.table)
newdata <- fread(Lines)

R: calculate number of occurrences which have started but not ended - count if within a datetime range

I've got a dataset with the following shape
ID Start Time End Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00
I would like to count every 15 min for an entire year how many items have started but not finished, so count the number of times with a start time greater or equal than the time I'm looking at and an end time less or equal than the time I'm looking at.
I'm looking for an approach using tidyverse/dplyr if possible.
Any help or guidance would be very much appreciated.
If I understand correctly, the OP wants to count the number of simultaneously active events.
One possibility to tackle this question is the coverage() function from Bioconductor's IRange package. Another one is to aggregate in a non-equi join which is available with the data.table package.
Non-equi join
# create sequence of datetimes (limited to 4 days for demonstration)
seq15 <- seq(lubridate::as_datetime("2017-01-01"),
lubridate::as_datetime("2017-01-05"), by = "15 mins")
# aggregate within a non-equi join
library(data.table)
result <- periods[.(time = seq15), on = .(Start.Time <= time, End.Time > time),
.(time, count = sum(!is.na(ID))), by = .EACHI][, .(time, count)]
result
time count
1: 2017-01-01 00:00:00 0
2: 2017-01-01 00:15:00 1
3: 2017-01-01 00:30:00 1
4: 2017-01-01 00:45:00 1
5: 2017-01-01 01:00:00 1
---
381: 2017-01-04 23:00:00 0
382: 2017-01-04 23:15:00 0
383: 2017-01-04 23:30:00 0
384: 2017-01-04 23:45:00 0
385: 2017-01-05 00:00:00 0
The result can be visualized graphically:
library(ggplot2)
ggplot(result) + aes(time, count) + geom_step()
Data
periods <- readr::read_table(
"ID Start.Time End.Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00"
)
# convert date strings to class Date
library(data.table)
cols <- names(periods)[names(periods) %like% "Time$"]
setDT(periods)[, (cols) := lapply(.SD, lubridate::dmy_hms), .SDcols = cols]
periods
ID Start.Time End.Time
1: 1 2017-01-01 00:15:00 2017-01-01 07:15:00
2: 2 2017-01-01 04:45:00 2017-01-01 06:15:00
3: 3 2017-01-01 10:20:00 2017-01-01 20:15:00
4: 4 2017-01-01 02:15:00 2017-01-01 00:15:00
5: 5 2017-01-02 15:15:00 2017-01-03 00:30:00
6: 6 2017-01-03 07:00:00 2017-01-04 09:15:00

Slow performance of split() function

I have a csv file consisting of around 200.000 rows of transactions. Here is the import and little preprocessing of the data:
data <- read.csv("bitfinex_data/trades.csv", header=T)
data$date <- as.character(data$date)
data$date <- substr(data$date, 1, 10)
data$date <- as.numeric(data$date)
data$date <- as.POSIXct(data$date, origin="1970-01-01", tz = "GMT")
head(data)
id exchange symbol date price amount sell
1 24892563 bf btcusd 2018-01-02 00:00:00 13375 0.05743154 False
2 24892564 bf btcusd 2018-01-02 00:00:01 13374 0.12226129 False
3 24892565 bf btcusd 2018-01-02 00:00:02 13373 0.00489140 False
4 24892566 bf btcusd 2018-01-02 00:00:02 13373 0.07510860 False
5 24892567 bf btcusd 2018-01-02 00:00:02 13373 0.11606086 False
6 24892568 bf btcusd 2018-01-02 00:00:03 13373 0.47000000 False
My goal is to obtain hourly sums of amount of token being traded. For this I need to split my data based on hours, which I did in a following way:
tmp <- split(data, cut(data$date,"hour"))
However this is taking way too long (up to 1 hour) and I wonder whether or not this is normal behaviour for functions such as split() and cut()? Is there any alternative to using those two functions?
UPDATE:
After using great suggestion by #Maurits Evers, my output table looks like this:
# A tibble: 25 x 2
date_hour amount.sum
<chr> <dbl>
1 1970-01-01 00 48.2
2 2018-01-02 00 2746.
3 2018-01-02 01 1552.
4 2018-01-02 02 2010.
5 2018-01-02 03 2171.
6 2018-01-02 04 3640.
7 2018-01-02 05 1399.
8 2018-01-02 06 836.
9 2018-01-02 07 856.
10 2018-01-02 08 819.
# ... with 15 more rows
This is exactly what I wanted, expect for the first row, where the date is from year 1970. Any suggestion on what might be causing the problem? I tried to change the origin parameter of as.POSIXct() function but that did not solve the problem.
I agree with #Roland's comment. To illustrate, here is an example.
Let's generate some data with 200000 entries in one minute time intervals.
set.seed(2018);
df <- data.frame(
date = seq(from = as.POSIXct("2018-01-01 00:00"), by = "min", length.out = 200000),
amount = runif(200000))
head(df);
# date amount
#1 2018-01-01 00:00:00 0.33615347
#2 2018-01-01 00:01:00 0.46372327
#3 2018-01-01 00:02:00 0.06058539
#4 2018-01-01 00:03:00 0.19743361
#5 2018-01-01 00:04:00 0.47431419
#6 2018-01-01 00:05:00 0.30104860
We now (1) create a new column date_hour that includes the date & hour part of the full date&time, (2) group_by column date_hour, and (3) sum entries from column amount to give amount.sum.
df %>%
mutate(date_hour = format(date, "%Y-%m-%d %H")) %>%
group_by(date_hour) %>%
summarise(amount.sum = sum(amount))
## A tibble: 3,333 x 2
# date_hour amount.sum
# <chr> <dbl>
# 1 2018-01-01 00 28.9
# 2 2018-01-01 01 26.4
# 3 2018-01-01 02 32.7
# 4 2018-01-01 03 29.9
# 5 2018-01-01 04 29.7
# 6 2018-01-01 05 28.5
# 7 2018-01-01 06 34.2
# 8 2018-01-01 07 33.8
# 9 2018-01-01 08 30.7
#10 2018-01-01 09 27.7
## ... with 3,323 more rows
This is very fast (it takes around 0.3 seconds on my 2012 MacBook Air), and you should be able to easily adjust this example to your particular case.
You can compute hourly sums without any packages, by using tapply. I use the random data as suggested by Maurits Evers:
set.seed(2018)
df <- data.frame(
date = seq(from = as.POSIXct("2018-01-01 00:00"),
by = "min", length.out = 200000),
amount = runif(200000))
head(df)
## date amount
## 1 2018-01-01 00:00:00 0.33615347
## 2 2018-01-01 00:01:00 0.46372327
## 3 2018-01-01 00:02:00 0.06058539
## 4 2018-01-01 00:03:00 0.19743361
## 5 2018-01-01 00:04:00 0.47431419
## 6 2018-01-01 00:05:00 0.30104860
tapply(df$amount,
format(df$date, "%Y-%m-%d %H"),
sum)
## 2018-01-01 00 2018-01-01 01 2018-01-01 02
## 28.85825 26.39385 32.73600
## 2018-01-01 03 2018-01-01 04 2018-01-01 05
## 29.88545 29.74048 28.46781
## ...

Interpolation of 15 minute values

I have a dataframe that looks like this:
dat <- data.frame(time = seq(as.POSIXct("2010-01-01"),
as.POSIXct("2016-12-31") + 60*99,
by = 60*15),
radiation = sample(1:500, 245383, replace = TRUE))
So I have every 15 minutes a measurement value. The structure is:
> str(dat)
'data.frame': 245383 obs. of 2 variables:
$ time : POSIXct, format: "2010-01-01 00:00:00" "2010-01-01 00:15:00" "2010-01-01 00:30:00" "2010-01-01 00:45:00" ...
$ radiation: num 230 443 282 314 286 225 77 89 97 330 ...
Now I want to interpolate, so my aim is a dataframe with values for every minute.
I searched a few times and tried some methods with the zoo package. But I have some problems with the dataframe. I have to convert it to a text file i guess? I have no idea how to do that.
Here is a tidyverse solution.
library('tidyverse')
dat <- data.frame(time = seq(as.POSIXct("2010-01-01"),
as.POSIXct("2016-12-31") + 60*99,
by = 60*15),
radiation = sample(1:500, 245383, replace = TRUE))
dat <- head(dat, 3)
dat
# time radiation
# 1 2010-01-01 00:00:00 241
# 2 2010-01-01 00:15:00 438
# 3 2010-01-01 00:30:00 457
You can create a data frame with all of the required times. Using full_join will make the missing radiation values be NA.
approx will fill the NAs with a linear approximation.
dat %>%
full_join(data.frame(time = seq(
from = min(.$time),
to = max(.$time),
by = 'min'))) %>%
arrange(time) %>%
mutate(radiation = approx(radiation, n = n())$y)
# Joining, by = "time"
# time radiation
# 1 2010-01-01 00:00:00 241.0000
# 2 2010-01-01 00:01:00 254.1333
# 3 2010-01-01 00:02:00 267.2667
# 4 2010-01-01 00:03:00 280.4000
# 5 2010-01-01 00:04:00 293.5333
# 6 2010-01-01 00:05:00 306.6667
# 7 2010-01-01 00:06:00 319.8000
# 8 2010-01-01 00:07:00 332.9333
# 9 2010-01-01 00:08:00 346.0667
# 10 2010-01-01 00:09:00 359.2000
# 11 2010-01-01 00:10:00 372.3333
# 12 2010-01-01 00:11:00 385.4667
# 13 2010-01-01 00:12:00 398.6000
# 14 2010-01-01 00:13:00 411.7333
# 15 2010-01-01 00:14:00 424.8667
# 16 2010-01-01 00:15:00 438.0000
# 17 2010-01-01 00:16:00 439.2667
# 18 2010-01-01 00:17:00 440.5333
# 19 2010-01-01 00:18:00 441.8000
# 20 2010-01-01 00:19:00 443.0667
# 21 2010-01-01 00:20:00 444.3333
# 22 2010-01-01 00:21:00 445.6000
# 23 2010-01-01 00:22:00 446.8667
# 24 2010-01-01 00:23:00 448.1333
# 25 2010-01-01 00:24:00 449.4000
# 26 2010-01-01 00:25:00 450.6667
# 27 2010-01-01 00:26:00 451.9333
# 28 2010-01-01 00:27:00 453.2000
# 29 2010-01-01 00:28:00 454.4667
# 30 2010-01-01 00:29:00 455.7333
# 31 2010-01-01 00:30:00 457.0000
You can use the approx function like this:
dat <- data.frame(time = seq(as.POSIXct("2016-12-01"),
as.POSIXct("2016-12-31") + 60*99,
by = 60*15),
radiation = sample(1:500, 2887, replace = TRUE))
mins <- seq(as.POSIXct("2016-12-01"),
as.POSIXct("2016-12-31") + 60*99,
by = 60)
out <- approx(dat$time, dat$radiation, mins)
Here is a solution using pad from the padr package to fill the gaps in your time column. na.approx is used for interpolation.
library(padr)
library(zoo)
dat[1:2, ]
time radiation
#1 2010-01-01 00:00:00 133
#2 2010-01-01 00:15:00 187
dat_padded <- pad(dat[1:2, ], interval = "min")
dat_padded$radiation <- zoo::na.approx(dat_padded$radiation)
dat_padded
time radiation
#1 2010-01-01 00:00:00 133.0
#2 2010-01-01 00:01:00 136.6
#3 2010-01-01 00:02:00 140.2
#4 2010-01-01 00:03:00 143.8
#5 2010-01-01 00:04:00 147.4
#6 2010-01-01 00:05:00 151.0
#7 2010-01-01 00:06:00 154.6
#8 2010-01-01 00:07:00 158.2
#9 2010-01-01 00:08:00 161.8
#10 2010-01-01 00:09:00 165.4
#11 2010-01-01 00:10:00 169.0
#12 2010-01-01 00:11:00 172.6
#13 2010-01-01 00:12:00 176.2
#14 2010-01-01 00:13:00 179.8
#15 2010-01-01 00:14:00 183.4
#16 2010-01-01 00:15:00 187.0
data
set.seed(1)
dat <-
data.frame(
time = seq(
as.POSIXct("2010-01-01"),
as.POSIXct("2016-12-31") + 60 * 99,
by = 60 * 15
),
radiation = sample(1:500, 245383, replace = TRUE)
)

Resources