R - more effective left_join [duplicate] - r

This question already has answers here:
Overlap join with start and end positions
(5 answers)
Closed 1 year ago.
I have got two dataframes - one containing names and ranges of limits (only few hundreds of rows, 1000 at most), which needs to be assigned to a "measurements" dataframe which can consist of million of rows (or ten's of millions of row).
Currently I am doing left_join and filtering value to get a specific limit assigned to each measurement. This however is quite ineffective and cost a lot of resources. For larger dataframes, the code is even unable to run.
Any ideas for more effective solutions will be helpful.
library(dplyr)
## this one has got only few houndreds rows
df_limits <- read.table(text="Title station_id limit_from limit_to
Level_3_Low 1 0 70
Level_2_Low 1 70 90
Level_1_Low 1 90 100
Optimal 1 100 110
Level_1_High 1 110 130
Level_2_High 1 130 150
Level_3_High 1 150 180
Level_3_Low 2 0 70
Level_2_Low 2 70 90
Level_1_Low 2 90 100
Optimal 2 100 110
Level_1_High 2 110 130
Level_2_High 2 130 150
Level_3_High 2 150 180
Level_3_Low 3 0 70
Level_2_Low 3 70 90
Level_1_Low 3 90 100
Optimal 3 100 110
Level_1_High 3 110 130
Level_2_High 3 130 150
Level_3_High 3 150 180
",header = TRUE, stringsAsFactors = TRUE)
# this DF has got millions of rows
df_measurements <- read.table(text="measurement_id station_id value
12121534 1 172
12121618 1 87
12121703 1 9
12121709 2 80
12121760 2 80
12121813 2 115
12121881 3 67
12121907 3 100
12121920 3 108
12121979 1 102
12121995 1 53
12122022 1 77
12122065 2 158
12122107 2 144
12122113 2 5
12122135 3 100
12122187 3 136
12122267 3 130
12122359 1 105
12122366 1 126
12122398 1 143
",header = TRUE, stringsAsFactors = TRUE)
df_results <- left_join(df_measurements,df_limits, by = "station_id") %>%
filter ((value >= limit_from & value < limit_to) | is.na(Title)) %>%
select(names(df_measurements), Title)

Another data.table solution using non-equijoins:
library(data.table)
setDT(df_measurements)
setDT(df_limits)
df_limits[df_measurements, .(station_id, measurement_id, value, Title),
on=.(station_id = station_id, limit_from < value, limit_to >= value)]
station_id measurement_id value Title
1: 1 12121534 172 Level_3_High
2: 1 12121618 87 Level_2_Low
3: 1 12121703 9 Level_3_Low
4: 2 12121709 80 Level_2_Low
5: 2 12121760 80 Level_2_Low
6: 2 12121813 115 Level_1_High
7: 3 12121881 67 Level_3_Low
8: 3 12121907 100 Level_1_Low
9: 3 12121920 108 Optimal
10: 1 12121979 102 Optimal
11: 1 12121995 53 Level_3_Low
12: 1 12122022 77 Level_2_Low
13: 2 12122065 158 Level_3_High
14: 2 12122107 144 Level_2_High
15: 2 12122113 5 Level_3_Low
16: 3 12122135 100 Level_1_Low
17: 3 12122187 136 Level_2_High
18: 3 12122267 130 Level_1_High
19: 1 12122359 105 Optimal
20: 1 12122366 126 Level_1_High
21: 1 12122398 143 Level_2_High

A simple base R (no need additional packages) option using subset + merge
subset(
merge(
df_measurements,
df_limits,
all = TRUE
),
limit_from < value & limit_to >= value
)
gives
station_id measurement_id value Title limit_from limit_to
7 1 12121534 172 Level_3_High 150 180
9 1 12121618 87 Level_2_Low 70 90
15 1 12121703 9 Level_3_Low 0 70
23 1 12122022 77 Level_2_Low 70 90
34 1 12122398 143 Level_2_High 130 150
39 1 12121979 102 Optimal 100 110
43 1 12121995 53 Level_3_Low 0 70
54 1 12122366 126 Level_1_High 110 130
60 1 12122359 105 Optimal 100 110
65 2 12121760 80 Level_2_Low 70 90
75 2 12121813 115 Level_1_High 110 130
79 2 12121709 80 Level_2_Low 70 90
91 2 12122065 158 Level_3_High 150 180
97 2 12122107 144 Level_2_High 130 150
99 2 12122113 5 Level_3_Low 0 70
108 3 12121907 100 Level_1_Low 90 100
116 3 12121920 108 Optimal 100 110
124 3 12122267 130 Level_1_High 110 130
127 3 12121881 67 Level_3_Low 0 70
136 3 12122135 100 Level_1_Low 90 100
146 3 12122187 136 Level_2_High 130 150
Another option is using dplyr
df_measurements %>%
group_by(station_id) %>%
mutate(Title = with(
df_limits,
Title[
findInterval(
value,
unique(unlist(cbind(limit_from, limit_to)[station_id == first(.$station_id)])),
left.open = TRUE
)
]
)) %>%
ungroup()
which gives
# A tibble: 21 x 4
measurement_id station_id value Title
<int> <int> <int> <fct>
1 12121534 1 172 Level_3_High
2 12121618 1 87 Level_2_Low
3 12121703 1 9 Level_3_Low
4 12121709 2 80 Level_2_Low
5 12121760 2 80 Level_2_Low
6 12121813 2 115 Level_1_High
7 12121881 3 67 Level_3_Low
8 12121907 3 100 Level_1_Low
9 12121920 3 108 Optimal
10 12121979 1 102 Optimal
# ... with 11 more rows
Benchmarking
f_TIC1 <- function() {
subset(
merge(
df_measurements,
df_limits,
all = TRUE
),
limit_from < value & limit_to >= value
)
}
f_TIC2 <- function() {
df_measurements %>%
group_by(station_id) %>%
mutate(Title = with(
df_limits,
Title[
findInterval(
value,
unique(unlist(cbind(limit_from, limit_to)[station_id == first(station_id)])),
left.open = TRUE
)
]
)) %>%
ungroup()
}
dt_limits <- as.data.table(df_limits)
dt_measurements <- as.data.table(df_measurements)
f_Waldi <- function() {
dt_limits[
dt_measurements,
.(station_id, measurement_id, value, Title),
on = .(station_id, limit_from < value, limit_to >= value)
]
}
f_TimTeaFan <- function() {
setkey(dt_limits, station_id, limit_from, limit_to)
foverlaps(dt_measurements[, value2 := value],
dt_limits,
by.x = c("station_id", "value", "value2"),
type = "within",
)[
value < limit_to,
.(measurement_id, station_id, value, Title)
]
}
you will see that
Unit: relative
expr min lq mean median uq max neval
f_TIC1() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
f_TIC2() 4.848639 4.909985 4.895588 4.942616 5.124704 2.580819 100
f_Waldi() 3.182027 3.010615 3.069916 3.114160 3.397845 1.698386 100
f_TimTeaFan() 5.523778 5.112872 5.226145 5.112407 5.745671 2.446987 100

Here is one way to do it. The problematic part was the condition value < limit_to. foverlaps checks for the condition value <= limit_to which results in double matches so here we call the filter condition after the overlapping join and then select the desired columns. Note that the result is not in the same order as the df_results generated with dplyr.
library(data.table)
dt_limits <- as.data.table(df_limits)
dt_measurements <- as.data.table(df_measurements)
setkey(dt_limits, station_id, limit_from, limit_to)
dt_results <- foverlaps(dt_measurements[, value2 := value],
dt_limits,
by.x = c("station_id", "value", "value2"),
type = "within",
)[value < limit_to,
.(measurement_id , station_id, value, Title)]
dt_results[]
#> measurement_id station_id value Title
#> 1: 12121534 1 172 Level_3_High
#> 2: 12121618 1 87 Level_2_Low
#> 3: 12121703 1 9 Level_3_Low
#> 4: 12121709 2 80 Level_2_Low
#> 5: 12121760 2 80 Level_2_Low
#> 6: 12121813 2 115 Level_1_High
#> 7: 12121881 3 67 Level_3_Low
#> 8: 12121907 3 100 Optimal
#> 9: 12121920 3 108 Optimal
#> 10: 12121979 1 102 Optimal
#> 11: 12121995 1 53 Level_3_Low
#> 12: 12122022 1 77 Level_2_Low
#> 13: 12122065 2 158 Level_3_High
#> 14: 12122107 2 144 Level_2_High
#> 15: 12122113 2 5 Level_3_Low
#> 16: 12122135 3 100 Optimal
#> 17: 12122187 3 136 Level_2_High
#> 18: 12122267 3 130 Level_2_High
#> 19: 12122359 1 105 Optimal
#> 20: 12122366 1 126 Level_1_High
#> 21: 12122398 1 143 Level_2_High
#> measurement_id station_id value Title
Created on 2021-08-09 by the reprex package (v0.3.0)

Related

R - Reducing a matrix

I have a square matrix that is like:
A <- c("111","111","111","112","112","113")
B <- c(100,10,20,NA,NA,10)
C <- c(10,20,40,NA,10,20)
D <- c(10,20,NA,NA,40,200)
E <- c(20,20,40,10,10,20)
F <- c(NA,NA,40,100,10,20)
G <- c(10,20,NA,30,10,20)
df <- data.frame(A,B,C,D,E,F,G)
names(df) <- c("Codes","111","111","111","112","112","113")
# Codes 111 111 111 112 112 113
# 1 111 100 10 10 20 NA 10
# 2 111 10 20 20 20 NA 20
# 3 111 20 40 NA 40 40 NA
# 4 112 NA NA NA 10 100 30
# 5 112 NA 10 40 10 10 10
# 6 113 10 20 200 20 20 20
I want to reduce it so that observations with the same row and column names are summed up.
So I want to end up with:
# Codes 111 112 113
# 1 111 230 120 30
# 2 112 50 130 40
# 3 113 230 40 20
I tried to first combine the rows with the same "Codes" number, but I was having a lot of trouble.
In tidyverse
library(tidyverse)
df %>%
pivot_longer(-Codes, values_drop_na = TRUE) %>%
group_by(Codes, name) %>%
summarise(value = sum(value), .groups = 'drop')%>%
pivot_wider()
# A tibble: 3 x 4
Codes `111` `112` `113`
<chr> <dbl> <dbl> <dbl>
1 111 230 120 30
2 112 50 130 40
3 113 230 40 20
One way in base R:
tapply(unlist(df[-1]), list(names(df)[-1][col(df[-1])], df[,1][row(df[-1])]), sum, na.rm = TRUE)
111 112 113
111 230 50 230
112 120 130 40
113 30 40 20
Note that this can be simplified as denoted by #thelatemail to
grp <- expand.grid(df$Codes, names(df)[-1])
tapply(unlist(df[-1]), grp, FUN=sum, na.rm=TRUE)
You can also use `xtabs:
xtabs(vals~., na.omit(cbind(grp, vals = unlist(df[-1]))))
Var2
Var1 111 112 113
111 230 120 30
112 50 130 40
113 230 40 20
When dealing with actual matrices - especially with large ones -, expressing the operation as (sparse) linear algebra should be most efficient.
library(Matrix) ## for sparse matrix operations
idx <- c("111","111","111","112","112","113")
mat <- matrix(c(100,10,20,NA,NA,10,
10,20,40,NA,10,20,
10,20,NA,NA,40,200,
20,20,40,10,10,20,
NA,NA,40,100,10,20,
10,20,NA,30,10,20),
nrow=length(idx),
byrow=TRUE, dimnames=list(idx, idx))
## convert NA's to zero
mat[is.na(mat)] <- 0
## examine matrix
mat
## 111 111 111 112 112 113
## 111 100 10 20 0 0 10
## 111 10 20 40 0 10 20
## 111 10 20 0 0 40 200
## 112 20 20 40 10 10 20
## 112 0 0 40 100 10 20
## 113 10 20 0 30 10 20
## indicator matrix
## converts between "code" and "idx" spaces
M_code_idx <- fac2sparse(idx)
## project to "code_code" space
M_code_idx %*% mat %*% t(M_code_idx)
## 3 x 3 Matrix of class "dgeMatrix"
## 111 112 113
## 111 230 50 230
## 112 120 130 40
## 113 30 40 20

How to use mutate_at() with two sets of variables, in R

Using dplyr, I want to divide a column by another one, where the two columns have a similar pattern.
I have the following data frame:
My_data = data.frame(
var_a = 101:110,
var_b = 201:210,
number_a = 1:10,
number_b = 21:30)
I would like to create a new variable: var_a_new = var_a/number_a, var_b_new = var_b/number_b and so on if I have c, d etc.
My_data %>%
mutate_at(
.vars = c('var_a', 'var_b'),
.funs = list( new = function(x) x/(.[,paste0('number_a', names(x))]) ))
I did not get an error, but a wrong result. I think that the problem is that I don't understand what the 'x' is. Is it one of the string in .vars? Is it a column in My_data? Something else?
One option could be:
bind_cols(My_data,
My_data %>%
transmute(across(starts_with("var"))/across(starts_with("number"))) %>%
rename_all(~ paste0(., "_new")))
var_a var_b number_a number_b var_a_new var_b_new
1 101 201 1 21 101.00000 9.571429
2 102 202 2 22 51.00000 9.181818
3 103 203 3 23 34.33333 8.826087
4 104 204 4 24 26.00000 8.500000
5 105 205 5 25 21.00000 8.200000
6 106 206 6 26 17.66667 7.923077
7 107 207 7 27 15.28571 7.666667
8 108 208 8 28 13.50000 7.428571
9 109 209 9 29 12.11111 7.206897
10 110 210 10 30 11.00000 7.000000
You can do this directly provided the columns are correctly ordered meaning "var_a" is first column in "var" group and "number_a" is first column in "number" group and so on for other pairs.
var_cols <- grep('var', names(My_data), value = TRUE)
number_cols <- grep('number', names(My_data), value = TRUE)
My_data[paste0(var_cols, '_new')] <- My_data[var_cols]/My_data[number_cols]
My_data
# var_a var_b number_a number_b var_a_new var_b_new
#1 101 201 1 21 101.00000 9.571429
#2 102 202 2 22 51.00000 9.181818
#3 103 203 3 23 34.33333 8.826087
#4 104 204 4 24 26.00000 8.500000
#5 105 205 5 25 21.00000 8.200000
#6 106 206 6 26 17.66667 7.923077
#7 107 207 7 27 15.28571 7.666667
#8 108 208 8 28 13.50000 7.428571
#9 109 209 9 29 12.11111 7.206897
#10 110 210 10 30 11.00000 7.000000
The function across() has replaced scope variants such as mutate_at(), summarize_at() and others. For more details, see vignette("colwise") or https://cran.r-project.org/web/packages/dplyr/vignettes/colwise.html. Based on tmfmnk's answer, the following works well:
My_data %>%
mutate(
new = across(starts_with("var"))/across(starts_with("number")))
The prefix "new." will be added to the names of the new variables.
var_a var_b number_a number_b new.var_a new.var_b
1 101 201 1 21 101.00000 9.571429
2 102 202 2 22 51.00000 9.181818
3 103 203 3 23 34.33333 8.826087
4 104 204 4 24 26.00000 8.500000
5 105 205 5 25 21.00000 8.200000
6 106 206 6 26 17.66667 7.923077
7 107 207 7 27 15.28571 7.666667
8 108 208 8 28 13.50000 7.428571
9 109 209 9 29 12.11111 7.206897
10 110 210 10 30 11.00000 7.000000

conditional Substacting numbers

I have data frame like this
test <- data.frame(gr=rep(letters[1:2],each=6),No=c(100:105,200:205))
gr No
1 a 100
2 a 101
3 a 102
4 a 103
5 a 104
6 a 105
7 b 200
8 b 201
9 b 202
10 b 203
11 b 204
12 b 205
in the No column the numbers are increasing in each gr. I need to sum gr a with 100 and b with 50 and need to have consecutive decrease after this operation.
I would like to have a new column that consecutive decrease with this increase. So I tried
decrese_func <- function(No,gr){
if(any(gr=="a")){
No+100
}
else
No+50
}
test%>%
group_by(gr)%>%
mutate(new_column=decrese_func(No,gr))
# A tibble: 12 x 3
# Groups: gr [2]
gr No new_column
<fct> <int> <dbl>
1 a 100 200
2 a 101 201
3 a 102 202
4 a 103 203
5 a 104 204
6 a 105 205
7 b 200 250
8 b 201 251
9 b 202 252
10 b 203 253
11 b 204 254
12 b 205 255
but what I need is like this
gr No new_column
<fct> <int> <dbl>
1 a 100 200
2 a 101 199
3 a 102 198
4 a 103 197
5 a 104 196
6 a 105 195
7 b 200 250
8 b 201 249
9 b 202 248
10 b 203 247
11 b 204 246
12 b 205 245
I cannot figure it out how to have consecutive decrease ?
Thx.
Not the most elegant answer but in the mean time, this may work:
library(dplyr)
test %>%
mutate(A = case_when(gr == "a" ~ 100,
gr == "b" ~ 50,
TRUE ~ NA_real_)) %>%
group_by(gr) %>%
mutate(B = (1:NROW(gr) - 1) * 2,
New_Column = No + A - B)
# A tibble: 12 x 5
# Groups: gr [2]
gr No A B New_Column
<fct> <int> <dbl> <dbl> <dbl>
1 a 100 100 0 200
2 a 101 100 2 199
3 a 102 100 4 198
4 a 103 100 6 197
5 a 104 100 8 196
6 a 105 100 10 195
7 b 200 50 0 250
8 b 201 50 2 249
9 b 202 50 4 248
10 b 203 50 6 247
11 b 204 50 8 246
12 b 205 50 10 245
Add select(gr, No, New_Column) at the end of the chain to get gr, No and New_Column only. I left the other columns just to show what's going on.
And if you want to wrap it into a function you could do something like:
desc_func <- function(group_var, condition, if_true_add, if_false_add, to_number) {
ifelse(
group_var == condition,
to_number + if_true_add - (1:NROW(group_var) - 1) * 2,
to_number + if_false_add - (1:NROW(group_var) - 1) * 2)
}
test %>%
group_by(gr) %>%
mutate(test_var = desc_func(gr, "a", 100, 50, No))
# A tibble: 12 x 3
# Groups: gr [2]
gr No test_var
<fct> <int> <dbl>
1 a 100 200
2 a 101 199
3 a 102 198
4 a 103 197
5 a 104 196
6 a 105 195
7 b 200 250
8 b 201 249
9 b 202 248
10 b 203 247
11 b 204 246
12 b 205 245
Here is a way to do this in base R
test$New <- with(test, No + c(100, 50)[cumsum(!duplicated(gr))] - 2*(No %% 100))
test$New
#[1] 200 199 198 197 196 195 250 249 248 247 246 245
Or a slight variation with match
with(test, No + c(100, 50)[match(gr, unique(gr))] - 2*(No %% 100))

Convert mapply output to dataframe variable

I have a data frame like this:
df <- data.frame(x=c(7,5,4),y=c(100,100,100),w=c(170,170,170),z=c(132,720,1256))
I create a new column using mapply:
set.seed(123)
library(truncnorm)
df$res <- mapply(rtruncnorm,df$x,df$y,df$w,df$z,25)
So, I got:
> df
#x y w z res
#1 7 100 170 132 117.9881, 126.2456, 133.7627, 135.2322, 143.5229, 100.3735, 114.8287
#2 5 100 170 720 168.8581, 169.4955, 169.6461, 169.8998, 169.0343
#3 4 100 170 1256 169.7245, 167.6744, 169.7025, 169.4441
#dput(df)
df <- structure(list(x = c(7, 5, 4), y = c(100, 100, 100), w = c(170,
170, 170), z = c(132, 720, 1256), res = list(c(117.988108836195,
126.245562762918, 133.762709785614, 135.232193379024, 143.52290514973,
100.373469134837, 114.828678702662), c(168.858147661715, 169.495493758985,
169.646123183828, 169.899849943838, 169.034333943479), c(169.724470294466,
167.674371713068, 169.70250974042, 169.444134892323))), .Names = c("x",
"y", "w", "z", "res"), row.names = c(NA, -3L), class = "data.frame")
But what I really need is repeat each row of df dataframe according to the df$res result as follows:
> df2
# x y w z res
#1 7 100 170 132 117.9881
#2 7 100 170 132 126.2456
#3 7 100 170 132 133.7627
#4 7 100 170 132 135.2322
#5 7 100 170 132 143.5229
#6 7 100 170 132 100.3735
#7 7 100 170 132 114.8287
#8 5 100 170 720 168.8581
#9 5 100 170 720 169.4955
#10 5 100 170 720 169.6461
#11 5 100 170 720 169.8998
#12 5 100 170 720 169.0343
#13 4 100 170 1256 169.7245
#14 4 100 170 1256 167.6744
#15 4 100 170 1256 169.7025
#16 4 100 170 1256 169.4441
How, do I achieve this efficiently? I need to apply this to a big dataframe
df <- data.frame(x=c(7,5,4),y=c(100,100,100),w=c(170,170,170),z=c(132,720,1256))
set.seed(123)
l <- mapply(rtruncnorm,df$x,df$y,df$w,df$z,25)
cbind.data.frame(df[rep(seq_along(l), lengths(l)),],
res = unlist(l))
# x y w z res
# 1 7 100 170 132 117.9881
# 1.1 7 100 170 132 126.2456
# 1.2 7 100 170 132 133.7627
# 1.3 7 100 170 132 135.2322
# 1.4 7 100 170 132 143.5229
# 1.5 7 100 170 132 100.3735
# 1.6 7 100 170 132 114.8287
# 2 5 100 170 720 168.8581
# 2.1 5 100 170 720 169.4955
# 2.2 5 100 170 720 169.6461
# 2.3 5 100 170 720 169.8998
# 2.4 5 100 170 720 169.0343
# 3 4 100 170 1256 169.7245
# 3.1 4 100 170 1256 167.6744
# 3.2 4 100 170 1256 169.7025
# 3.3 4 100 170 1256 169.4441
Try this based on your given df:
df$res <- sapply(df$res, paste0, collapse=",")
do.call(rbind, apply(df, 1, function(x) do.call(expand.grid, strsplit(x, ","))))
# x y w z res
# 1 7 100 170 132 117.988108836195
# 2 7 100 170 132 126.245562762918
# 3 7 100 170 132 133.762709785614
# 4 7 100 170 132 135.232193379024
# 5 7 100 170 132 143.52290514973
# 6 7 100 170 132 100.373469134837
# 7 7 100 170 132 114.828678702662
# 8 5 100 170 720 168.858147661715
# 9 5 100 170 720 169.495493758985
# 10 5 100 170 720 169.646123183828
# 11 5 100 170 720 169.899849943838
# 12 5 100 170 720 169.034333943479
# 13 4 100 170 1256 169.724470294466
# 14 4 100 170 1256 167.674371713068
# 15 4 100 170 1256 169.70250974042
# 16 4 100 170 1256 169.444134892323

R: Error while calculating Rolling Median and Rolling Mean

I am trying to calculate 3 period rolling means and rolling medians for the following data:
SiteID Month TotalSessions TotalMinutes
1 201401 132 1334
1 201402 159 2498
1 201403 98 734
1 201404 112 909
2 201402 25 220
2 201404 32 407
4 201401 10 77
4 201402 12 112
4 201403 9 59
However I am getting an when I use the following function:
ave(mydf$TotalSessions, mydf$SiteID, FUN = function(x) rollmedian(x,k=3, align = "right", na.pad = T))
Error: k <= n is not TRUE
I understand that the error is because that for some SiteIDs there are less than 3 periods of data and hence the rolling median is not getting calculated.
My question is, is there a way where I can add the missing months with 0s in TotalSessions and Total Minutes so that the data would look as follows:
SiteID Month TotalSessions TotalMinutes
1 201401 132 1334
1 201402 159 2498
1 201403 98 734
1 201404 112 909
2 201401 0 0
2 201402 25 220
2 201403 0 0
2 201404 32 407
4 201401 10 77
4 201402 12 112
4 201403 9 59
4 201404 0 0
Thanks for the help!
Personally I would use one of the solution proposed in the answer or in comments.
Here an answer to modify your data by adding 0 for missing months(the desired output). I mainly use merge function.
xx <- data.frame(Month=unique(dat$Month))
res <- do.call(rbind,
by(dat,dat$SiteID,function(x)merge(x,xx,all.y=TRUE)))
res[is.na(res)] <- 0
# Month SiteID TotalSessions TotalMinutes
# 1.1 201401 1 132 1334
# 1.2 201402 1 159 2498
# 1.3 201403 1 98 734
# 1.4 201404 1 112 909
# 2.1 201401 0 0 0
# 2.2 201402 2 25 220
# 2.3 201403 0 0 0
# 2.4 201404 2 32 407
# 4.1 201401 4 10 77
# 4.2 201402 4 12 112
# 4.3 201403 4 9 59
# 4.4 201404 0 0 0
Padding with NAs would be better, but even better than that is rollapply with partial = TRUE:
ave(mydf$TotalSessions, mydf$SiteID
, FUN = function(x) {rollapply(x, 3, median, align = "right", partial = TRUE)})

Resources