Replace NA with mean of adjacent values - r

I want to replace NA value with mean of adjacent non-missing values in "return" column, grouped by "id". Let assume that there are only two months: 1,2 in a year.
df <- data.frame(id = c("A","A","A","A","B","B","B","B"),
year = c(2014,2014,2015,2015),
month = c(1, 2),
marketcap = c(4,6,2,6,23,2,5,34),
return = c(NA,0.23,0.2,0.1,0.4,0.9,NA,0.6))
df1
id year month marketcap return
1: A 2014 1 4 NA # <-
2: A 2014 2 6 0.23
3: A 2015 1 2 0.20
4: A 2015 2 6 0.10
5: B 2014 1 23 0.40
6: B 2014 2 2 0.90
7: B 2015 1 5 NA # <-
8: B 2015 2 34 0.60
Desired data
desired_df <- data.frame(id = c("A","A","A","A","B","B","B","B"),
year = c(2014,2014,2015,2015),
month = c(1,2),
marketcap = c(4,6,2,6,23,2,5,34),
return = c(0.23,0.23,0.2,0.1,0.4,0.9,0.75,0.6))
desired_df
id year month marketcap return
1 A 2014 1 4 0.23 # <-
2 A 2014 2 6 0.23
3 A 2015 1 2 0.20
4 A 2015 2 6 0.10
5 B 2014 1 23 0.40
6 B 2014 2 2 0.90
7 B 2015 1 5 0.75 # <-
8 B 2015 2 34 0.60
The second NA (row 7) should be replaced by the mean of the values before and after, i.e. (0.9 + 0.6)/2 = 0.75.
Note that the first NA (row 1), has no previous data. Here NA should be replaced with the next non-missing value, 0.23 ("last observation carried backwards").
A data.table solution is preferred if it is possible
UPDATE:
When use the code structure as follows (which works for the sample)
df[,returnInterpolate:=na.approx(return,rule=2), by=id]
I have encountered the error:
Error in approx(x[!na], y[!na], xout, ...) :
need at least two non-NA values to interpolate
I guess that may be there is some id that have no non-NA values to interpolate. .any suggestions?

library(data.table)
df <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
marketcap=c(4,6,2,6,23,2,5,34),
return=c(NA,0.23,0.2,0.1,0.4,0.9,NA,0.6))
setDT(df)
library(zoo)
df[, returnInterpol := na.approx(return, rule = 2), by = id]
# id year month marketcap return returnInterpol
#1: A 2014 1 4 NA 0.23
#2: A 2014 2 6 0.23 0.23
#3: A 2015 1 2 0.20 0.20
#4: A 2015 2 6 0.10 0.10
#5: B 2014 1 23 0.40 0.40
#6: B 2014 2 2 0.90 0.90
#7: B 2015 1 5 NA 0.75
#8: B 2015 2 34 0.60 0.60
Edit:
If you have groups with only NA values or only one non-NA, you could do this:
df <- data.frame(id=c("A","A","A","A","B","B","B","B","C","C","C","C"),
year=c(2014,2014,2015,2015),
month=c(1,2),
marketcap=c(4,6,2,6,23,2,5,34, 1:4),
return=c(NA,0.23,0.2,0.1,0.4,0.9,NA,0.6,NA,NA,0.3,NA))
setDT(df)
df[, returnInterpol := switch(as.character(sum(!is.na(return))),
"0" = return,
"1" = {na.omit(return)},
na.approx(return, rule = 2)), by = id]
# id year month marketcap return returnInterpol
# 1: A 2014 1 4 NA 0.23
# 2: A 2014 2 6 0.23 0.23
# 3: A 2015 1 2 0.20 0.20
# 4: A 2015 2 6 0.10 0.10
# 5: B 2014 1 23 0.40 0.40
# 6: B 2014 2 2 0.90 0.90
# 7: B 2015 1 5 NA 0.75
# 8: B 2015 2 34 0.60 0.60
# 9: C 2014 1 1 NA 0.30
# 10: C 2014 2 2 NA 0.30
# 11: C 2015 1 3 0.30 0.30
# 12: C 2015 2 4 NA 0.30

The easy imputeTS solution without caring for the ID would be:
library("imputeTS")
na.interpolate(df)
Since the imputation should be according to ID, it is a little bit more complicated - since it seems often there are not enough values left when filtered by ID. I would take the solution Roland posted and use imputeTS::na.interpolation() where possible and in the other cases maybe the overall mean with imputeTS::na.mean() or a random guess in the overall bounds imputeTS::na.random() could be used.
In this case it might also be a very good idea to look beyond univariate time series interpolation / imputation. There are a lot of other variables that could help estimating the missing values (if there is a correlation). Packages like AMELIA could help here.

Related

counting NA from R Dataframe in a for loop

If I have a timeseries dataframe in r from 2011 to 2018. How can I do a for loop where I count the number of NA per year separately and if that specific year has more than x % I drop that year or do something.
please refer to the image to see how my Dataframe looks like.
https://i.stack.imgur.com/2fwDk.png
years_values <- 2011:2020
years = pretty(years_values,n=10)
count = 0
for (y in years){
for (j in df$Flow == y) {
if (is.na(df$Flow[j]){
count = count+1
}
}
if (count) > 1{
bfi = BFI(df$Flow == y)}
else {bfi = NA}
}
I am trying to use this code to loop for each year and then count the NA. and if the NA is greater than 1% I want to no compute for BFI and if it is less the compute for the BFI. I do have the BFI function working well. The problem I have is to formulate this loop.
Since you have not included any reproducible data, let us take a simple example that captures the essence of your own data. We have a column called Year and one called Flow that contains some missing values:
df <- data.frame(Year = rep(2011:2013, each = 4),
Flow = c(1, 2, NA, NA, 5, 6, NA, 8, 9, 10, 11, 12))
df
#> Year Flow
#> 1 2011 1
#> 2 2011 2
#> 3 2011 NA
#> 4 2011 NA
#> 5 2012 5
#> 6 2012 6
#> 7 2012 NA
#> 8 2012 8
#> 9 2013 9
#> 10 2013 10
#> 11 2013 11
#> 12 2013 12
Now suppose we want to count the number of missing values in each year. We can use table and is.na, like this:
tab <- table(df$Year, is.na(df$Flow))
tab
#>
#> FALSE TRUE
#> 2011 2 2
#> 2012 3 1
#> 2013 4 0
We can see that these are the absolute counts of missing values, but we can convert this into proportions by dividing the second column by the row sums of this table:
props <- tab[,2] / rowSums(tab)
props
#> 2011 2012 2013
#> 0.50 0.25 0.00
Now, suppose we want to find and remove the years where more than 33% of cases are missing. We can just filter the values of props that are greater than 0.33 and get the associated year (or years):
years_to_drop <- names(props)[props > 0.33]
years_to_drop
#> [1] "2011"
Now we can use this to remove the years with more than 33% missing values from our original data frame by doing:
df[!df$Year %in% years_to_drop,]
#> Year Flow
#> 5 2012 5
#> 6 2012 6
#> 7 2012 NA
#> 8 2012 8
#> 9 2013 9
#> 10 2013 10
#> 11 2013 11
#> 12 2013 12
Created on 2022-11-14 with reprex v2.0.2
As Allan Cameron suggests, there's no need to use a loop, and R is usually more efficient working vectorially anyway.
I would suggest a solution based on ave (using the synthetic data from the previous answer)
df$NA_fraction <- ave(df$Flow, df$Year, FUN = \(values) mean(is.na(values)))
df
Year Flow NA_fraction
1 2011 1 0.50
2 2011 2 0.50
3 2011 NA 0.50
4 2011 NA 0.50
5 2012 5 0.25
6 2012 6 0.25
7 2012 NA 0.25
8 2012 8 0.25
9 2013 9 0.00
10 2013 10 0.00
11 2013 11 0.00
12 2013 12 0.00
You can then pick whatever threshold and filter by it
> df[df$NA_fraction < 0.3,]
Year Flow NA_fraction
5 2012 5 0.25
6 2012 6 0.25
7 2012 NA 0.25
8 2012 8 0.25
9 2013 9 0.00
10 2013 10 0.00
11 2013 11 0.00
12 2013 12 0.00

For loop in R to rewrite initial datasets

UPD:
HERE what I need:
Example of some datasets are here (I have 8 of them):
https://drive.google.com/drive/folders/1gBV2ZkywW6JqDjRICafCwtYhh2DHWaUq?usp=sharing
What I need is:
For example, in those datasets there is lev variable. Let's say this is a snapshot of the data in these datasets:
ID Year lev
1 2011 0.19
1 2012 0.19
1 2013 0.21
1 2014 0.18
2 2013 0.39
2 2014 0.15
2 2015 0.47
2 2016 0.35
3 2013 0.30
3 2015 0.1
3 2017 0.13
3 2018 0.78
4 2011 0.13
4 2012 0.35
Now, I need to create in each of my datasets EE_AB, EE_C, EE_H, etc., create variables ff1 and ff2 which are constructed for year ID, in each year respectively to the median of the whole IDs in that particular year.
Let's take an example of the year 2011. The median of the variable lev in this dataset in 2011 is (0.19+0.13)/2 = 0.16, so ff1 for ID 1 in 2011 should be 0.19/0.16 = 1.1875, and for ID 4 in 2011 ff1 = 0.13/0.16 = 0.8125.
Now let's take the example of 2013. The median lev is 0.3. so ff1 for ID 1, 2, 3 will be 0.7, 1.3, 1 respectively.
The desired output should be the ff1 variable in each dataset (e.g., EE_AB, EE_C, EE_H) as:
ID Year lev ff1
1 2011 0.19 1.1875
1 2012 0.19 0.7037
1 2013 0.21 0.7
1 2014 0.18 1.0909
2 2013 0.39 1.3
2 2014 0.15 0.9091
2 2015 0.47 1.6491
2 2016 0.35 1
3 2013 0.30 1
3 2015 0.1 0.3509
3 2017 0.13 1
3 2018 0.78 1
4 2011 0.13 0.8125
4 2012 0.35 1.2963
And this should be in the same way for other dataframes.
Here's a tidyverse method:
library(dplyr)
# library(purrr)
data_frameAB %>%
group_by(Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
ungroup()
# # A tibble: 14 x 5
# ID Year c d ff1
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2011 10 12 2.2
# 2 1 2012 11 13 2.18
# 3 1 2013 12 14 2.17
# 4 1 2014 13 15 2.15
# 5 1 2015 14 16 2.14
# 6 1 2016 15 34 3.27
# 7 1 2017 16 25 2.56
# 8 1 2018 17 26 2.53
# 9 1 2019 18 56 4.11
# 10 15 2015 23 38 2.65
# 11 15 2016 26 25 1.96
# 12 15 2017 30 38 2.27
# 13 45 2011 100 250 3.5
# 14 45 2012 200 111 1.56
Without purrr, that inner expression would be
mutate(ff1 = (c+d) / mapply(median, c, d))
albeit with type-safeness.
Since you have multiple frames in your data management, I have two suggestions:
Combine them into a list. This recommendation stems off the assumption that whatever you're doing to one frame you are likely to do all three. In that case, you can use lapply or purrr::map on the list of frames, doing all frames in one step. See https://stackoverflow.com/a/24376207/3358227.
list_of_frames <- list(AB=data_frameAB, C=data_frameC, F=data_frameF)
list_of_frames2 <- purrr::map(
list_of_frames,
~ .x %>%
group_by(Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>% ungroup()
)
Again, without purrr, that would be
list_of_frames2 <- lapply(
list_of_frames,
function(.x) group_by(.x, Year) %>%
mutate(ff1 = (c+d) / mapply(median c, d)) %>%
ungroup()
)
Combine them into one frame, preserving the original data. Starting with list_of_frames,
bind_rows(list_of_frames, .id = "Frame") %>%
group_by(Frame, Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
ungroup()
# # A tibble: 42 x 6
# Frame ID Year c d ff1
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AB 1 2011 10 12 2.2
# 2 AB 1 2012 11 13 2.18
# 3 AB 1 2013 12 14 2.17
# 4 AB 1 2014 13 15 2.15
# 5 AB 1 2015 14 16 2.14
# 6 AB 1 2016 15 34 3.27
# 7 AB 1 2017 16 25 2.56
# 8 AB 1 2018 17 26 2.53
# 9 AB 1 2019 18 56 4.11
# 10 AB 15 2015 23 38 2.65
# # ... with 32 more rows

Average percentage change over different years in R

I have a data frame from which I created a reproducible example:
country <- c('A','A','A','B','B','C','C','C','C')
year <- c(2010,2011,2015,2008,2009,2008,2009,2011,2015)
score <- c(1,2,2,1,4,1,1,3,2)
country year score
1 A 2010 1
2 A 2011 2
3 A 2015 2
4 B 2008 1
5 B 2009 4
6 C 2008 1
7 C 2009 1
8 C 2011 3
9 C 2015 2
And I am trying to calculate the average percentage increase (or decrease) in the score for each country by calculating [(final score - initial score) รท (initial score)] for each year and averaging it over the number of years.
country year score change
1 A 2010 1 NA
2 A 2011 2 1
3 A 2015 2 0
4 B 2008 1 NA
5 B 2009 4 3
6 C 2008 1 NA
7 C 2009 1 0
8 C 2011 3 2
9 C 2015 2 -0.33
The final result I am hoping to obtain:
country avg_change
1 A 0.5
2 B 3
3 C 0.55
As you can see, the trick is that countries have spans over different years, sometimes with a missing year in between. I tried different ways to do it manually but I do struggle. If someone could hint me a solution would be great. Many thanks.
With dplyr, we can group_by country and get mean of difference between scores.
library(dplyr)
df %>%
group_by(country) %>%
summarise(avg_change = mean(c(NA, diff(score)), na.rm = TRUE))
# country avg_change
# <fct> <dbl>
#1 A 0.500
#2 B 3.00
#3 C 0.333
Using base R aggregate with same logic
aggregate(score~country, df, function(x) mean(c(NA, diff(x)), na.rm = TRUE))
We can use data.table to group by 'country' and take the mean of the difference between the 'score' and the lag of 'score'
library(data.table)
setDT(df1)[, .(avg_change = mean(score -lag(score), na.rm = TRUE)), .(country)]
# country avg_change
#1: A 0.5000000
#2: B 3.0000000
#3: C 0.3333333

Grouped moving average in r

I'm trying to calculate a moving average in r over a particular field BUT I need this moving average to be grouped by two or more other fields. The purpose of this new average is for predictive analysis so I need it to be trailing as well.
Any variables that do not have enough values to be averaged (such as student J) would ideally give either NA or its original Score value.
I've been trying rollapply and data.table and am having no luck!
I've provided the table of data and two moving averages (AVG2 with k=2 and AVG3 with k=3) to show exactly what I'm after. The moving average is on Score and the variables to group over are school, Student and area. Please help!
no school Student area Score **AVG2** **AVG3**
1 I S A 5 NA NA
2 B S A 2 NA NA
3 B S A 7 NA NA
4 B O A 3 NA NA
5 B O B 9 NA NA
6 I O A 6 NA NA
7 I O B 3 NA NA
8 I S A 7 NA NA
9 I O A 1 NA NA
10 B S A 7 4.5 NA
11 I S A 3 NA NA
12 I O A 8 3.5 NA
13 B S A 3 7 5.33
14 I O A 4 4.5 5
15 B O A 1 NA NA
16 I S A 9 5 5
17 B S A 4 5 5.67
18 B O A 6 2 NA
19 I S A 3 6 6.33
20 I O B 8 NA NA
21 B S A 3 3.5 4.67
22 I O A 4 6 4.33
23 B O A 1 3.5 3.33
24 I S A 9 6 5
25 B S A 4 3.5 3.33
26 B O A 6 3.5 2.67
27 I J A 6 NA NA
here is the code to recreate the initial table in r:
school <- c('I','B','B','B','B','I','I','I','I','B','I','I','B','I','B','I','B','B','I','I','B','I','B','I','B','B','I')
Student <- c('S','S','S','O','O','O','O','S','O','S','S','O','S','O','O','S','S','O','S','O','S','O','O','S','S','O','J')
area <- c('A','A','A','A','B','A','B','A','A','A','A','A','A','A','A','A','A','A','A','B','A','A','A','A','A','A','A')
Score <- c(5,2,7,3,9,6,3,7,1,7,3,8,3,4,1,9,4,6,3,8,3,4,1,9,4,6,6)
data.frame(school, Student, area, Score)
You can try solving the problem using dplyr and TTR but for student J from school I it is not possible to calculate a moving average as there's only one measurement.
AVG2 caluculated with stats:filter gives the result you wanted to have, but I also added AVG2b calculated with TTR::SMA to show a simple moving average calculation, where the current measurement is also taken into account.
library(dplyr)
library(TTR)
df <- data.frame(school, Student, Score)
df$AVG2 <- NA
df$AVG2b <- NA
df[!(df$school=="I" & df$Student=="J"),] <- df[!(df$school=="I" & df$Student=="J"),] %>%
group_by(school, Student) %>%
mutate(AVG2 = stats::filter(Score, c(0, 0.5, 0.5), sides = 1 ), AVG2b = SMA(Score, n= 2))
> df
school Student Score AVG2 AVG2b
1 I S 5 NA NA
2 B S 2 NA NA
3 B S 7 NA 4.5
4 B O 3 NA NA
5 B O 9 NA 6.0
6 I O 6 NA NA
7 I O 3 NA 4.5
8 I S 7 NA 6.0
9 I O 1 4.5 2.0
10 B S 7 4.5 7.0
...
Here is a rollapply solution. Note that it appears that you want the average of the prior two or three rows in the same group, i.e. excluding the data on the current row.
library(zoo)
roll <- function(x, n) {
if (length(x) <= n) NA
else rollapply(x, list(-seq(n)), mean, fill = NA)
}
transform(DF, AVG2 = ave(Score, school, Student, FUN = function(x) roll(x, 2)),
AVG3 = ave(Score, school, Student, FUN = function(x) roll(x, 3)))
giving:
school Student Score AVG2 AVG3
1 I S 5 NA NA
2 B S 2 NA NA
3 B S 7 NA NA
4 B O 3 NA NA
5 B O 9 NA NA
6 I O 6 NA NA
7 I O 3 NA NA
8 I S 7 NA NA
9 I O 1 4.5 NA
10 B S 7 4.5 NA
11 I S 3 6.0 NA
12 I O 8 2.0 3.333333
13 B S 3 7.0 5.333333
14 I O 4 4.5 4.000000
15 B O 1 6.0 NA
16 I S 9 5.0 5.000000
17 B S 4 5.0 5.666667
18 B O 6 5.0 4.333333
19 I S 3 6.0 6.333333
20 I O 8 6.0 4.333333
21 B S 3 3.5 4.666667
22 I O 4 6.0 6.666667
23 B O 1 3.5 5.333333
24 I S 9 6.0 5.000000
25 B S 4 3.5 3.333333
26 B O 6 3.5 2.666667
27 I J 6 NA NA
Update: Fixed roll.
Here is AVG2 calculation with data.table, which is faster compared to other approaches:
library(data.table)
dt <- data.table(df)
setkey(dt, school, Student, area)
dt[, c("start", "len") := .(ifelse(.I + 1 > .I[.N], 0, .I +1), pmax(pmin(1, .I[.N] - .I -1), 0)), by = .(school, Student, area)][
, AVG2 := mean(dt$Score[start:(start+len)]), by = 1:nrow(dt)]
res$AVG2[res$len == 0] <- NA

How calculate growth rate in long format data frame?

With data structured as follows...
df <- data.frame(Category=c(rep("A",6),rep("B",6)),
Year=rep(2010:2015,2),Value=1:12)
I'm having a tough time creating a growth rate column (by year) within category. Can anyone help with code to create something like this...
Category Year Value Growth
A 2010 1
A 2011 2 1.000
A 2012 3 0.500
A 2013 4 0.333
A 2014 5 0.250
A 2015 6 0.200
B 2010 7
B 2011 8 0.143
B 2012 9 0.125
B 2013 10 0.111
B 2014 11 0.100
B 2015 12 0.091
For these sorts of questions ("how do I compute XXX by category YYY")? there are always solutions based on by(), the data.table() package, and plyr. I generally prefer plyr, which is often slower, but (to me) more transparent/elegant.
df <- data.frame(Category=c(rep("A",6),rep("B",6)),
Year=rep(2010:2015,2),Value=1:12)
library(plyr)
ddply(df,"Category",transform,
Growth=c(NA,exp(diff(log(Value)))-1))
The main difference between this answer and #krlmr's is that I am using a geometric-mean trick (taking differences of logs and then exponentiating) while #krlmr computes an explicit ratio.
Mathematically, diff(log(Value)) is taking the differences of the logs, i.e. log(x[t+1])-log(x[t]) for all t. When we exponentiate that we get the ratio x[t+1]/x[t] (because exp(log(x[t+1])-log(x[t])) = exp(log(x[t+1]))/exp(log(x[t])) = x[t+1]/x[t]). The OP wanted the fractional change rather than the multiplicative growth rate (i.e. x[t+1]==x[t] corresponds to a fractional change of zero rather than a multiplicative growth rate of 1.0), so we subtract 1.
I am also using transform() for a little bit of extra "syntactic sugar", to avoid creating a new anonymous function.
You can simply use dplyr package:
> df %>% group_by(Category) %>% mutate(Growth = (Value - lag(Value))/lag(Value))
which will produce the following result:
# A tibble: 12 x 4
# Groups: Category [2]
Category Year Value Growth
<fct> <int> <int> <dbl>
1 A 2010 1 NA
2 A 2011 2 1
3 A 2012 3 0.5
4 A 2013 4 0.333
5 A 2014 5 0.25
6 A 2015 6 0.2
7 B 2010 7 NA
8 B 2011 8 0.143
9 B 2012 9 0.125
10 B 2013 10 0.111
11 B 2014 11 0.1
12 B 2015 12 0.0909
Using R base function (ave)
> dfdf$Growth <- with(df, ave(Value, Category,
FUN=function(x) c(NA, diff(x)/x[-length(x)]) ))
> df
Category Year Value Growth
1 A 2010 1 NA
2 A 2011 2 1.00000000
3 A 2012 3 0.50000000
4 A 2013 4 0.33333333
5 A 2014 5 0.25000000
6 A 2015 6 0.20000000
7 B 2010 7 NA
8 B 2011 8 0.14285714
9 B 2012 9 0.12500000
10 B 2013 10 0.11111111
11 B 2014 11 0.10000000
12 B 2015 12 0.09090909
#Ben Bolker's answer is easily adapted to ave:
transform(df, Growth=ave(Value, Category,
FUN=function(x) c(NA,exp(diff(log(x)))-1)))
Very easy with plyr:
library(plyr)
ddply(df, .(Category),
function (d) {
d$Growth <- c(NA, tail(d$Value, -1) / head(d$Value, -1) - 1)
d
}
)
We have two problems here:
Splitting by category
Computing the growth rate
ddply is the workhorse, the split and the function to compute the growth rate is defined by parameters to this function.
A more elegant variant based on Ben's idea with the new gdiff function in my R package:
df <- data.frame(Category=c(rep("A",6),rep("B",6)),
Year=rep(2010:2015,2),Value=1:12)
library(plyr)
ddply(df, "Category", transform,
Growth=c(NA, kimisc::gdiff(Value, FUN = `/`)-1))
Here, gdiff is used to compute a lagged rate (instead of a lagged difference as diff would).
Many years later: the tsbox package aims to work with all kind of time series objects, including data frames, and offers a standard time series toolkit. Thus, calculating growth rates is as simple as:
df <- data.frame(Category=c(rep("A",6),rep("B",6)),
Year=rep(2010:2015,2),Value=1:12)
library(tsbox)
ts_pc(df)
#> [time]: 'Year' [value]: 'Value'
#> Category Year Value
#> 1 A 2010-01-01 NA
#> 2 A 2011-01-01 100.000000
#> 3 A 2012-01-01 50.000000
#> 4 A 2013-01-01 33.333333
#> 5 A 2014-01-01 25.000000
#> 6 A 2015-01-01 20.000000
#> 7 B 2010-01-01 NA
#> 8 B 2011-01-01 14.285714
#> 9 B 2012-01-01 12.500000
#> 10 B 2013-01-01 11.111111
#> 11 B 2014-01-01 10.000000
#> 12 B 2015-01-01 9.090909
The package collapse available in CRAN provides an easy and fully C/C++ based solution to these kinds of problems: with the generic function fgrowth and the associated growth operator G:
df <- data.frame(Category=c(rep("A",6),rep("B",6)),
Year=rep(2010:2015,2),Value=1:12)
library(collapse)
G(df, by = ~Category, t = ~Year)
Category Year G1.Value
1 A 2010 NA
2 A 2011 100.000000
3 A 2012 50.000000
4 A 2013 33.333333
5 A 2014 25.000000
6 A 2015 20.000000
7 B 2010 NA
8 B 2011 14.285714
9 B 2012 12.500000
10 B 2013 11.111111
11 B 2014 10.000000
12 B 2015 9.090909
# fgrowth is more of a programmers function, you can do:
fgrowth(df$Value, 1, 1, df$Category, df$Year)
[1] NA 100.000000 50.000000 33.333333 25.000000 20.000000 NA 14.285714 12.500000 11.111111 10.000000 9.090909
# Which means: Calculate the growth rate of Value, using 1 lag, and iterated 1 time (you can compute arbitrary sequences of lagged / leaded and iterated growth rates with these functions), identified by Category and Year.
fgrowth / G also has methods for the plm::pseries and plm::pdata.frame classes available in the plm package.

Resources