Moving average by multiple group - r

I have a following DF (demo). I would like to find the previous 3 month moving average of Amount column per ID, Year and Month.
ID YEAR MONTH AMOUNT
1 ABC 2020 09 100
2 ABC 2020 11 200
3 ABC 2020 12 300
4 ABC 2021 01 400
5 ABC 2021 04 500
6 PQR 2020 10 100
7 PQR 2020 11 200
8 PQR 2020 12 300
9 PQR 2021 01 400
10 PQR 2021 03 500
Following is an attempt.
library(TTR)
library(dplyr)
DF %>% group_by(ID, YEAR, MONTH) %>% mutate(3MA = runMean(AMOUNT, 3))
resulting in error with n=3 is outside valid range.
Desired Output:
ID YEAR MONTH AMOUNT 3MA
1 ABC 2020 09 100 NA
2 ABC 2020 11 200 NA
3 ABC 2020 12 300 NA
4 ABC 2021 01 400 200 (100+200+300)/3
5 ABC 2021 04 500 300 (400+300+200)/3
6 PQR 2020 10 100 NA
7 PQR 2020 11 200 NA
8 PQR 2020 12 300 NA
9 PQR 2021 01 400 200 (100+200+300)/3
10 PQR 2021 03 500 300 (400+300+200)/3

You can use the following code:
library(dplyr)
arrange(DF,ID,YEAR) %>%
group_by(ID) %>%
mutate(lag1=lag(AMOUNT),
lag2=lag(AMOUNT,2),
lag3=lag(AMOUNT,3),
movave=(lag1+lag2+lag3)/3)
#> # A tibble: 10 × 8
#> # Groups: ID [2]
#> ID YEAR MONTH AMOUNT lag1 lag2 lag3 movave
#> <chr> <int> <int> <int> <int> <int> <int> <dbl>
#> 1 ABC 2020 9 100 NA NA NA NA
#> 2 ABC 2020 11 200 100 NA NA NA
#> 3 ABC 2020 12 300 200 100 NA NA
#> 4 ABC 2021 1 400 300 200 100 200
#> 5 ABC 2021 4 500 400 300 200 300
#> 6 PQR 2020 10 100 NA NA NA NA
#> 7 PQR 2020 11 200 100 NA NA NA
#> 8 PQR 2020 12 300 200 100 NA NA
#> 9 PQR 2021 1 400 300 200 100 200
#> 10 PQR 2021 3 500 400 300 200 300
Created on 2022-07-02 by the reprex package (v2.0.1)

An option using a sliding window:
library(tidyverse)
library(slider)
df <- tribble(
~id, ~year, ~month, ~amount,
"ABC", 2020, 09, 100,
"ABC", 2020, 11, 200,
"ABC", 2020, 12, 300,
"ABC", 2021, 01, 400,
"ABC", 2021, 04, 500,
"PQR", 2020, 10, 100,
"PQR", 2020, 11, 200,
"PQR", 2020, 12, 300,
"PQR", 2021, 01, 400,
"PQR", 2021, 03, 500
)
df |>
arrange(id, year, month) |>
group_by(id) |>
mutate(ma3 = slide_dbl(lag(amount), mean, .before = 2, complete = TRUE)) |>
ungroup() # if needed
#> # A tibble: 10 × 5
#> id year month amount ma3
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 ABC 2020 9 100 NA
#> 2 ABC 2020 11 200 NA
#> 3 ABC 2020 12 300 NA
#> 4 ABC 2021 1 400 200
#> 5 ABC 2021 4 500 300
#> 6 PQR 2020 10 100 NA
#> 7 PQR 2020 11 200 NA
#> 8 PQR 2020 12 300 NA
#> 9 PQR 2021 1 400 200
#> 10 PQR 2021 3 500 300
Created on 2022-07-02 by the reprex package (v2.0.1)

Here is a way.
suppressPackageStartupMessages({
library(dplyr)
library(TTR)
})
x <- ' ID YEAR MONTH AMOUNT
1 ABC 2020 09 100
2 ABC 2020 11 200
3 ABC 2020 12 300
4 ABC 2021 01 400
5 ABC 2021 04 500
6 PQR 2020 10 100
7 PQR 2020 11 200
8 PQR 2020 12 300
9 PQR 2021 01 400
10 PQR 2021 03 500 '
DF <- read.table(textConnection(x), header = TRUE)
DF %>%
arrange(ID, YEAR, MONTH) %>%
group_by(ID) %>%
mutate(`3MA` = lag(runMean(AMOUNT, 3)))
#> # A tibble: 10 × 5
#> # Groups: ID [2]
#> ID YEAR MONTH AMOUNT `3MA`
#> <chr> <int> <int> <int> <dbl>
#> 1 ABC 2020 9 100 NA
#> 2 ABC 2020 11 200 NA
#> 3 ABC 2020 12 300 NA
#> 4 ABC 2021 1 400 200
#> 5 ABC 2021 4 500 300
#> 6 PQR 2020 10 100 NA
#> 7 PQR 2020 11 200 NA
#> 8 PQR 2020 12 300 NA
#> 9 PQR 2021 1 400 200
#> 10 PQR 2021 3 500 300
Created on 2022-07-02 by the reprex package (v2.0.1)

Try this
DF |> arrange(ID , YEAR , MONTH) |> group_by(ID) |>
mutate(`3M` = (lag(AMOUNT) + lag(AMOUNT ,2) + lag(AMOUNT , 3)) / 3)
output
# A tibble: 10 × 5
# Groups: ID [2]
ID YEAR MONTH AMOUNT `3M`
<chr> <int> <int> <int> <dbl>
1 ABC 2020 9 100 NA
2 ABC 2020 11 200 NA
3 ABC 2020 12 300 NA
4 ABC 2021 1 400 200
5 ABC 2021 4 500 300
6 PQR 2020 10 100 NA
7 PQR 2020 11 200 NA
8 PQR 2020 12 300 NA
9 PQR 2021 1 400 200
10 PQR 2021 3 500 300

Related

Fill a column down, recursively applying a function to the previous value

Consider this data
data <- data.frame(
group = c(rep("A", 10), rep("B", 10)),
year = rep(2016:2025, 2),
value = c(10, 20, 30, 40, NA, NA, NA, NA, NA, NA,
70, 80, NA, NA, NA, NA, NA, NA, NA, NA)
)
data
#> group year value
#> 1 A 2016 10
#> 2 A 2017 20
#> 3 A 2018 30
#> 4 A 2019 40
#> 5 A 2020 NA
#> 6 A 2021 NA
#> 7 A 2022 NA
#> 8 A 2023 NA
#> 9 A 2024 NA
#> 10 A 2025 NA
#> 11 B 2016 70
#> 12 B 2017 80
#> 13 B 2018 NA
#> 14 B 2019 NA
#> 15 B 2020 NA
#> 16 B 2021 NA
#> 17 B 2022 NA
#> 18 B 2023 NA
#> 19 B 2024 NA
#> 20 B 2025 NA
Now I’d like to fill in the missing values applying an arbitrary function
to the last non-missing value, recursively.
For example, let’s say that I want to
let the values increase by 10 points, yearly.
So, for rows where value is not NA, it should remain unmodified.
Starting from the row where value is NA, it applies that valuet = valuet − 1 + 10
A naive attempt to do it would be to use dplyr::lag,
but this only works for the first missing value because lag is vectorized and
operates on the value vector and do not recurse over the previous values
library(dplyr)
data |>
group_by(group) |>
mutate(value_fix = dplyr::lag(value) + 10)
#> # A tibble: 20 × 4
#> # Groups: group [2]
#> group year value value_fix
#> <chr> <int> <dbl> <dbl>
#> 1 A 2016 10 NA
#> 2 A 2017 20 20
#> 3 A 2018 30 30
#> 4 A 2019 40 40
#> 5 A 2020 NA 50
#> 6 A 2021 NA NA
#> 7 A 2022 NA NA
#> 8 A 2023 NA NA
#> 9 A 2024 NA NA
#> 10 A 2025 NA NA
#> 11 B 2016 70 NA
#> 12 B 2017 80 80
#> 13 B 2018 NA 90
#> 14 B 2019 NA NA
#> 15 B 2020 NA NA
#> 16 B 2021 NA NA
#> 17 B 2022 NA NA
#> 18 B 2023 NA NA
#> 19 B 2024 NA NA
#> 20 B 2025 NA NA
This is where I thought tidyr could help, because it is somewhat similar
to tidyr::fill
data |>
group_by(group) |>
tidyr::fill(value)
but ideally, with an .f argument to be applied recursively to the last value.
There does not seem to be something like that.
Googling around I came up with this solution
data |>
group_by(group) |>
mutate(last_value = case_when(
value == dplyr::last(na.omit(value)) ~ value,
TRUE ~ NA_real_
)) |>
mutate(value_fix = purrr::accumulate(
.x = last_value,
.f = ~ coalesce(.x + 10, .y)
))
#> # A tibble: 20 × 5
#> # Groups: group [2]
#> group year value last_value value_fix
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 A 2016 10 NA NA
#> 2 A 2017 20 NA NA
#> 3 A 2018 30 NA NA
#> 4 A 2019 40 40 40
#> 5 A 2020 NA NA 50
#> 6 A 2021 NA NA 60
#> 7 A 2022 NA NA 70
#> 8 A 2023 NA NA 80
#> 9 A 2024 NA NA 90
#> 10 A 2025 NA NA 100
#> 11 B 2016 70 NA NA
#> 12 B 2017 80 80 80
#> 13 B 2018 NA NA 90
#> 14 B 2019 NA NA 100
#> 15 B 2020 NA NA 110
#> 16 B 2021 NA NA 120
#> 17 B 2022 NA NA 130
#> 18 B 2023 NA NA 140
#> 19 B 2024 NA NA 150
#> 20 B 2025 NA NA 160
Which works, but seems kind of hacky and not easy to read. It would be cleaner to just write the loop and be happy.
I really thought for such a simple case, there would be a built-in way (vectorized, readable-code) to do it in the tidyverse. But I could not find any. Am I missing something?, any ideas how to better do this?
Created on 2022-08-30 with reprex v2.0.2
custom_fun <- function(x, y) {
if(is.na(y)) x + 10 else y
}
data %>%
group_by(group)%>%
mutate(value = accumulate(value, custom_fun))
# Groups: group [2]
group year value
<chr> <int> <dbl>
1 A 2016 10
2 A 2017 20
3 A 2018 30
4 A 2019 40
5 A 2020 50
6 A 2021 60
7 A 2022 70
8 A 2023 80
9 A 2024 90
10 A 2025 100
11 B 2016 70
12 B 2017 80
13 B 2018 90
14 B 2019 100
15 B 2020 110
16 B 2021 120
17 B 2022 130
18 B 2023 140
19 B 2024 150
20 B 2025 160
A base solution with ave() + Reduce(accumulate = TRUE):
transform(data, value = ave(value, group, FUN = \(val) {
Reduce(\(x, y) if(is.na(y)) x + 10 else y, val, accumulate = TRUE)
}))
group year value
1 A 2016 10
2 A 2017 20
3 A 2018 30
4 A 2019 40
5 A 2020 50
6 A 2021 60
7 A 2022 70
8 A 2023 80
9 A 2024 90
10 A 2025 100
11 B 2016 70
12 B 2017 80
13 B 2018 90
14 B 2019 100
15 B 2020 110
16 B 2021 120
17 B 2022 130
18 B 2023 140
19 B 2024 150
20 B 2025 160
You could also write a vectorized function:
fun <- function(x){
idx <- is.na(x)
b <- rle(idx)
id2 <- cumsum(b$lengths)[!b$values]
x[idx] <- sequence(b$lengths[b$values], x[id2] + 10, by=10)
x
}
transform(data, value = fun(value))
group year value
1 A 2016 10
2 A 2017 20
3 A 2018 30
4 A 2019 40
5 A 2020 50
6 A 2021 60
7 A 2022 70
8 A 2023 80
9 A 2024 90
10 A 2025 100
11 B 2016 70
12 B 2017 80
13 B 2018 90
14 B 2019 100
15 B 2020 110
16 B 2021 120
17 B 2022 130
18 B 2023 140
19 B 2024 150
20 B 2025 160

How to get a value for the next year

Considering the following dataset:
Company name
Year
Customers
Company A
2018
100
Company B
2018
120
Company C
2018
150
Company A
2019
120
Company B
2019
180
Company C
2019
80
Company A
2020
200
Company B
2020
500
Company C
2020
140
What I want to do is to measure the future return of the customers. So, I need to have the customer amount of next year in a new column. Something like this:
Company name
Year
Customers
Customers next year
Company A
2018
100
120
Company B
2018
120
180
Company C
2018
150
80
Company A
2019
120
200
Company B
2019
180
500
Company C
2019
80
140
Company A
2020
200
NA
Company B
2020
500
NA
Company C
2020
140
NA
Does anybody have any idea how to do this?
You can use lead like in the following code:
library(dplyr)
df %>%
group_by(Company) %>%
mutate(customers_next_year = lead(Customers)) %>%
ungroup()
Output:
# A tibble: 9 × 4
# Groups: Company [3]
Company Year Customers customers_next_year
<chr> <dbl> <dbl> <dbl>
1 A 2018 100 120
2 B 2018 120 180
3 C 2018 150 80
4 A 2019 120 200
5 B 2019 180 500
6 C 2019 80 140
7 A 2020 200 NA
8 B 2020 500 NA
9 C 2020 140 NA
Data
df <- data.frame(Company = rep(c("A", "B", "C"), 3),
Year = c(2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020),
Customers = c(100,120,150,120,180,80,200,500,140))
A method without external pacakges:
within(df, {
customers_next_year <- ave(Customers, Company, FUN = \(x) c(x[-1], NA))
})
# Company Year Customers customers_next_year
# 1 A 2018 100 120
# 2 B 2018 120 180
# 3 C 2018 150 80
# 4 A 2019 120 200
# 5 B 2019 180 500
# 6 C 2019 80 140
# 7 A 2020 200 NA
# 8 B 2020 500 NA
# 9 C 2020 140 NA

Checking if a statement is true for three consecutive years in R

I am trying to create some code to determine if a certain statement is true for three consecutive years. This statement is, for example, that the amount needs to be negative for three consecutive years. Only if that is true, the third row should be marked in a new column with a 1, otherwise, the rows should be marked as a 0.
To further elaborate on what I try to do, I've created the following dataset:
ID
Year
Amount
001
2022
2
001
2021
-9
001
2020
-10
001
2019
-16
001
2018
-20
002
2022
400
002
2021
300
002
2020
-200
002
2019
-600
002
2018
-500
And the outcome should look somewhat like this:
ID
Year
Amount
Y/N
001
2022
2
0
001
2021
-9
1
001
2020
-10
1
001
2019
-16
0
001
2018
-20
0
002
2022
400
0
002
2021
300
0
002
2020
-200
1
002
2019
-600
0
002
2018
-500
0
Does anybody know how I could code this?
This should do it :
df <- read.table(h=T, text="ID Year Amount
001 2022 2
001 2021 -9
001 2020 -10
001 2019 -16
001 2018 -20
002 2022 400
002 2021 300
002 2020 -200
002 2019 -600
002 2018 -500")
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(YN = rev(diff(c(0,0,0, cumsum(rev(Amount)<0)), 3) == 3)) %>%
ungroup()
#> # A tibble: 10 × 4
#> ID Year Amount YN
#> <int> <int> <int> <lgl>
#> 1 1 2022 2 FALSE
#> 2 1 2021 -9 TRUE
#> 3 1 2020 -10 TRUE
#> 4 1 2019 -16 FALSE
#> 5 1 2018 -20 FALSE
#> 6 2 2022 400 FALSE
#> 7 2 2021 300 FALSE
#> 8 2 2020 -200 TRUE
#> 9 2 2019 -600 FALSE
#> 10 2 2018 -500 FALSE
Created on 2022-04-21 by the reprex package (v2.0.1)
In base R:
df$YN <- ave(df$Amount, df$ID, FUN = function(x) rev(diff(c(0,0,0, cumsum(rev(x)<0)), 3) == 3)) == 1
Here's a walkthrough :
df %>%
arrange(ID, Year) %>%
group_by(ID) %>%
mutate(
cond = Amount < 0,
cumsum = cumsum(cond),
diff = diff(c(0,0,0, cumsum), 3),
YN = diff == 3) # %>% arrange(ID, desc(Year)) # to put back in original shape
#> # A tibble: 10 × 7
#> # Groups: ID [2]
#> ID Year Amount cond cumsum diff YN
#> <int> <int> <int> <lgl> <int> <dbl> <lgl>
#> 1 1 2018 -20 TRUE 1 1 FALSE
#> 2 1 2019 -16 TRUE 2 2 FALSE
#> 3 1 2020 -10 TRUE 3 3 TRUE
#> 4 1 2021 -9 TRUE 4 3 TRUE
#> 5 1 2022 2 FALSE 4 2 FALSE
#> 6 2 2018 -500 TRUE 1 1 FALSE
#> 7 2 2019 -600 TRUE 2 2 FALSE
#> 8 2 2020 -200 TRUE 3 3 TRUE
#> 9 2 2021 300 FALSE 3 2 FALSE
#> 10 2 2022 400 FALSE 3 1 FALSE
And an easier solution to read, not generalisable to big spans:
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(YN = (Amount < 0) + lead(Amount < 0, default = 0) + lead(Amount < 0, 2, default = 0) == 3) %>%
ungroup()
Using the dataset
df <- data.frame(id = c(rep(1,5,),rep(2,5)), year = rep(seq(2022,2018),2), amount = c(2,-9,-10,-16,-20,400,300,-200,-600,-500))
A sapply function can be used:
df$test <- ifelse(sapply(seq(1,nrow(df)), function(x) all(subset(df, df$id == df$id[x] & df$year %in% seq(df$year[x]-2,df$year[x]))$amount < 0) & nrow(subset(df, df$id == df$id[x] & df$year %in% seq(df$year[x]-2,df$year[x]))) > 2),1,0)
That gives:
df
id year amount test
1 1 2022 2 0
2 1 2021 -9 1
3 1 2020 -10 1
4 1 2019 -16 0
5 1 2018 -20 0
6 2 2022 400 0
7 2 2021 300 0
8 2 2020 -200 1
9 2 2019 -600 0
10 2 2018 -500 0
Breakdown:
ifelse to change the TRUE/FALSE from the SAPPLY to 1/0
sapply to iterate per row
all to test whether the condition holds for all values inside the subset
subset to select the last three years and the correct id
$amount < 0 to test the condition
You could use data.table package.
library(data.table)
setDT(df)
df[order(Year), YN := match(Reduce(`+`, shift(Amount<0, 0:2)), 3, 0), ID]
# ID Year Amount YN
# 1: 1 2022 2 0
# 2: 1 2021 -9 1
# 3: 1 2020 -10 1
# 4: 1 2019 -16 0
# 5: 1 2018 -20 0
# 6: 2 2022 400 0
# 7: 2 2021 300 0
# 8: 2 2020 -200 1
# 9: 2 2019 -600 0
# 10: 2 2018 -500 0
data
df = structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
Year = c(2022, 2021, 2020, 2019, 2018, 2022, 2021, 2020, 2019, 2018),
Amount = c(2, -9, -10, -16, -20, 400, 300, -200, -600, -500)),
row.names = c(NA, -10L),
class = "data.frame")

Duplicate rows in dataframe

I have a data.frame which looks like so:
df <- data.frame(id=c("001","002","003","004"),year=c(2015,2015,2015,2015),
x1=c(15,20,25,30),x2=c(1,2,3,4))
id year x1 x2
001 2015 15 1
002 2015 20 2
003 2015 25 3
004 2015 30 4
I would like to duplicate id, x1, and x2 but change the year to end up with a data.frame that resembles the following:
id year x1 x2
001 2015 15 1
002 2015 20 2
003 2015 25 3
004 2015 30 4
001 2016 15 1
002 2016 20 2
003 2016 25 3
004 2016 30 4
I can achieve this by doing
df2 <- df %>%
mutate(year = 2016)
df3 <- rbind(df, df2)
But I am wondering if there is a more intuitive way, so that I can create duplicates for 20+ years without needing to make multiple new data.frames?
df <- data.frame(id=c("001","002","003","004"),year=c(2015,2015,2015,2015),
x1=c(15,20,25,30),x2=c(1,2,3,4))
library(tidyr)
df %>% complete(nesting(id, x1, x2), year = 2015:2016)
#> # A tibble: 8 x 4
#> id x1 x2 year
#> <chr> <dbl> <dbl> <dbl>
#> 1 001 15 1 2015
#> 2 001 15 1 2016
#> 3 002 20 2 2015
#> 4 002 20 2 2016
#> 5 003 25 3 2015
#> 6 003 25 3 2016
#> 7 004 30 4 2015
#> 8 004 30 4 2016
For extra years you just need to change 2015:2016 according to your need. You may also use dynamic referencing here using seq
library(tidyverse)
df <- data.frame(id=c("001","002","003","004"),year=c(2015,2015,2015,2015),
x1=c(15,20,25,30),x2=c(1,2,3,4))
map_dfr(0:1, ~mutate(df, year = year + .x))
#> id year x1 x2
#> 1 001 2015 15 1
#> 2 002 2015 20 2
#> 3 003 2015 25 3
#> 4 004 2015 30 4
#> 5 001 2016 15 1
#> 6 002 2016 20 2
#> 7 003 2016 25 3
#> 8 004 2016 30 4
Created on 2021-06-16 by the reprex package (v2.0.0)

compute deflation factor to index wages, by CPI, in panel data

I'm struggling to understand exactly how to compute a deflation factor for wages in a panel based on inflation.
I've teh R example below to help me illustrate the issue.
In Wooldridge (2009:452) Introductory Econometrics, 5th ed., he creates a deflation factor by dividing 107.6 by 65.2, i.e. 107.6/65.2 ≈ 1.65, but I can't figure out to to apply this to my own panel data. Wooldridge only mentions the deflation factor in passing.
Say I have a mini panel with two people, Jane and Tom, staring from 2006/2009 and running until 2015 with their yearly wage,
# install.packages(c("dplyr"), dependencies = TRUE)
library(dplyr)
set.seed(2)
tbl <- tibble(id = rep(c('Jane', 'Tom'), c(7, 10)),
yr = c(2009:2015, 2006:2015),
wg = c(rnorm(7, mean=5.1*10^4, sd=9), rnorm(10, 4*10^4, 12))
); tbl
#> A tibble: 17 x 3
#> id yr wg
#> <chr> <int> <dbl>
#> 1 Jane 2009 50991.93
#> 2 Jane 2010 51001.66
#> 3 Jane 2011 51014.29
#> 4 Jane 2012 50989.83
#> 5 Jane 2013 50999.28
#> 6 Jane 2014 51001.19
#> 7 Jane 2015 51006.37
#> 8 Tom 2006 39997.12
#> 9 Tom 2007 40023.81
#> 10 Tom 2008 39998.33
#> 11 Tom 2009 40005.01
#> 12 Tom 2010 40011.78
#> 13 Tom 2011 39995.29
#> 14 Tom 2012 39987.52
#> 15 Tom 2013 40021.39
#> 16 Tom 2014 39972.27
#> 17 Tom 2015 40010.54
I now get the consumer price index (CPI) (using this answer)
# install.packages(c("Quandl"), dependencies = TRUE)
CPI00to16 <- Quandl::Quandl("FRED/CPIAUCSL", collapse="annual",
start_date="2000-01-01", end_date="2016-01-01")
as_tibble(CPI00to16)
#> # A tibble: 17 x 2
#> Date Value
#> <date> <dbl>
#> 1 2016-12-31 238.106
#> 2 2015-12-31 237.846
#> 3 2014-12-31 236.290
#> 4 2013-12-31 234.723
#> 5 2012-12-31 231.221
#> 6 2011-12-31 227.223
#> 7 2010-12-31 220.472
#> 8 2009-12-31 217.347
#> 9 2008-12-31 211.398
#> 10 2007-12-31 211.445
#> 11 2006-12-31 203.100
#> 12 2005-12-31 198.100
#> 13 2004-12-31 191.700
#> 14 2003-12-31 185.500
#> 15 2002-12-31 181.800
#> 16 2001-12-31 177.400
#> 17 2000-12-31 174.600
my question is how do I deflate Jane and Tom's wages cf. Wooldridge 2009 selecting 2015 as the baseline year?
update; following MrSmithGoesToWashington’s comment below.
CPI00to16$yr <- as.numeric(format(CPI00to16$Date,'%Y'))
CPI00to16 <- mutate(CPI00to16, deflation_factor = CPI00to16[2,2]/Value)
df <- tbl %>% inner_join(as_tibble(CPI00to16[,3:4]), by = "yr")
df <- mutate(df, wg_defl = deflation_factor*wg, wg_diff = wg_defl-wg)
df
#> # A tibble: 17 x 6
#> id yr wg deflation_factor wg_defl wg_diff
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Jane 2009 50991.93 1.094315 55801.21 4809.2844
#> 2 Jane 2010 51001.66 1.078804 55020.78 4019.1176
#> 3 Jane 2011 51014.29 1.046751 53399.28 2384.9910
#> 4 Jane 2012 50989.83 1.028652 52450.80 1460.9728
#> 5 Jane 2013 50999.28 1.013305 51677.83 678.5477
#> 6 Jane 2014 51001.19 1.006585 51337.04 335.8494
#> 7 Jane 2015 51006.37 1.000000 51006.37 0.0000
#> 8 Tom 2006 39997.12 1.171078 46839.76 6842.6394
#> 9 Tom 2007 40023.81 1.124860 45021.18 4997.3691
#> 10 Tom 2008 39998.33 1.125110 45002.53 5004.1909
#> 11 Tom 2009 40005.01 1.094315 43778.07 3773.0575
#> 12 Tom 2010 40011.78 1.078804 43164.86 3153.0747
#> 13 Tom 2011 39995.29 1.046751 41865.12 1869.8369
#> 14 Tom 2012 39987.52 1.028652 41133.26 1145.7322
#> 15 Tom 2013 40021.39 1.013305 40553.87 532.4863
#> 16 Tom 2014 39972.27 1.006585 40235.49 263.2225
#> 17 Tom 2015 40010.54 1.000000 40010.54 0.0000

Resources