use model object, e.g. panelmodel, to flag data used - r

Is it possible in some way to use a fit object, specifically the regression object I get form a plm() model, to flag observations, in the data used for the regression, if they were in fact used in the regression. I realize this could be done my looking for complete observations in my original data, but I am curious if there's a way to use the fit/reg object to flag the data.
Let me illustrate my issue with a minimal working example,
First some packages needed,
# install.packages(c("stargazer", "plm", "tidyverse"), dependencies = TRUE)
library(plm); library(stargazer); library(tidyverse)
Second some data, this example is drawing heavily on Baltagi (2013), table 3.1, found in ?plm,
data("Grunfeld", package = "plm")
dta <- Grunfeld
now I create some semi-random missing values in my data object, dta
dta[c(3:13),3] <- NA; dta[c(22:28),4] <- NA; dta[c(30:33),5] <- NA
final step in the data preparation is to create a data frame with an index attribute that describes its individual and time dimensions, using tidyverse,
dta.p <- dta %>% group_by(firm, year)
Now to the regression
plm.reg <- plm(inv ~ value + capital, data = dta.p, model = "pooling")
the results, using stargazer,
stargazer(plm.reg, type="text") # stargazer(dta, type="text")
#> ============================================
#> Dependent variable:
#> ---------------------------
#> inv
#> ----------------------------------------
#> value 0.114***
#> (0.008)
#>
#> capital 0.237***
#> (0.028)
#>
#> Constant -47.962***
#> (9.252)
#>
#> ----------------------------------------
#> Observations 178
#> R2 0.799
#> Adjusted R2 0.797
#> F Statistic 348.176*** (df = 2; 175)
#> ===========================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Say I know my data has 200 observations, and I want to find the 178 that was used in the regression.
I am speculating if there's some vector in the plm.reg I can (easily) use to crate a flag i my original data, dta, if this observation was used/not used, i.e. the semi-random missing values I created above. Maybe some broom like tool.
I imagine something like,
dta <- dta %>% valid_reg_obs(plm.reg)
The desired outcome would look something like this, the new element is the vector plm.reg at the end, i.e.,
dta %>% as_tibble()
#> # A tibble: 200 x 6
#> firm year inv value capital plm.reg
#> * <int> <int> <dbl> <dbl> <dbl> <lgl>
#> 1 1 1935 318 3078 2.80 T
#> 2 1 1936 392 4662 52.6 T
#> 3 1 1937 NA 5387 157 F
#> 4 1 1938 NA 2792 209 F
#> 5 1 1939 NA 4313 203 F
#> 6 1 1940 NA 4644 207 F
#> 7 1 1941 NA 4551 255 F
#> 8 1 1942 NA 3244 304 F
#> 9 1 1943 NA 4054 264 F
#> 10 1 1944 NA 4379 202 F
#> # ... with 190 more rows
Update, I tried to use broom's augment(), but unforunatly it gave me the error message I had hoped would create some flag,
# install.packages(c("broom"), dependencies = TRUE)
library(broom)
augment(plm.reg, dta)
#> Error in data.frame(..., check.names = FALSE) :
#> arguments imply differing number of rows: 200, 178

The vector is plm.reg$residuals. Not sure of a nice broom solution, but this seems to work:
library(tidyverse)
dta.p %>%
as.data.frame %>%
rowid_to_column %>%
mutate(plm.reg = rowid %in% names(plm.reg$residuals))
for people who use the class pdata.frame() to create an index attribute that describes its individual and time dimensions, you can us the following code, this is from another Baltagi in the ?plm,
# == Baltagi (2013), pp. 204-205
data("Produc", package = "plm")
pProduc <- pdata.frame(Produc, index = c("state", "year", "region"))
form <- log(gsp) ~ log(pc) + log(emp) + log(hwy) + log(water) + log(util) + unemp
Baltagi_reg_204_5 <- plm(form, data = pProduc, model = "random", effect = "nested")
pProduc %>% mutate(reg.re = rownames(pProduc) %in% names(Baltagi_reg_204_5$residuals)) %>%
as_tibble() %>% select(state, year, region, reg.re)
#> # A tibble: 816 x 4
#> state year region reg.re
#> <fct> <fct> <fct> <lgl>
#> 1 CONNECTICUT 1970 1 T
#> 2 CONNECTICUT 1971 1 T
#> 3 CONNECTICUT 1972 1 T
#> 4 CONNECTICUT 1973 1 T
#> 5 CONNECTICUT 1974 1 T
#> 6 CONNECTICUT 1975 1 T
#> 7 CONNECTICUT 1976 1 T
#> 8 CONNECTICUT 1977 1 T
#> 9 CONNECTICUT 1978 1 T
#> 10 CONNECTICUT 1979 1 T
#> # ... with 806 more rows
finally, if you are running the first Baltagi without index attributes, i.e. unmodified example from the help file, the code should be,
Grunfeld %>% rowid_to_column %>%
mutate(plm.reg = rowid %in% names(p$residuals)) %>% as_tibble()

Related

Trying to extract specific characters in a column in R?

The content in the column appears as follows $1,521+ 2 bds. I want to extract 1521 and put it in a new column. I know this can be done in alteryx using regex can I do it R?
How about the following?:
library(tidyverse)
x <- '$1,521+ 2 bds'
parse_number(x)
For example:
library(tidyverse)
#generate some data
tbl <- tibble(string = str_c('$', as.character(seq(1521, 1541, 1)), '+', ' 2bds'))
new_col <-
tbl$string %>%
str_split('\\+',simplify = TRUE) %>%
`[`(, 1) %>%
str_sub(2, -1) #get rid of '$' at the start
mutate(tbl, number = new_col)
#> # A tibble: 21 x 2
#> string number
#> <chr> <chr>
#> 1 $1521+ 2bds 1521
#> 2 $1522+ 2bds 1522
#> 3 $1523+ 2bds 1523
#> 4 $1524+ 2bds 1524
#> 5 $1525+ 2bds 1525
#> 6 $1526+ 2bds 1526
#> 7 $1527+ 2bds 1527
#> 8 $1528+ 2bds 1528
#> 9 $1529+ 2bds 1529
#> 10 $1530+ 2bds 1530
#> # … with 11 more rows
Created on 2021-06-12 by the reprex package (v2.0.0)
We can use sub from base R
as.numeric( sub("\\$(\\d+),(\\d+).*", "\\1\\2", x))
#[1] 1521
data
x <- '$1,521+ 2 bds'

How to create rate on R

I want to change my data so that it gives me the rate of pedestrians to that states population. I am using a linear model and my summary values look like this:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.087061 0.029876 2.914 0.00438 **
intersection 0.009192 0.003086 2.978 0.00362 **
Here, my beta value intersection is .009192 and that is not meaningful because compared to a state that has a smaller population, this value might be nothing in comparison.
Below is a condensed version of my data without all the columns I use, but here is the link of the csv incase someone wants to download it from there.
> head(c)
# A tibble: 6 x 15
STATE STATENAME PEDS PERSONS PERMVIT PERNOTMVIT COUNTY COUNTYNAME CITY DAY MONTH YEAR LATITUDE LONGITUD
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Alabama 0 3 3 0 81 LEE (81) 2340 7 2 2019 32.7 -85.3
2 1 Alabama 0 2 2 0 55 ETOWAH (55) 1280 23 1 2019 34.0 -86.1
3 1 Alabama 0 4 4 0 29 CLEBURNE (29) 0 22 1 2019 33.7 -85.4
4 1 Alabama 1 1 1 1 55 ETOWAH (55) 2562 22 1 2019 34.0 -86.1
5 1 Alabama 0 1 1 0 3 BALDWIN (3) 0 18 1 2019 30.7 -87.8
6 1 Alabama 0 2 2 0 85 LOWNDES (85) 0 7 1 2019 32.2 -86.4
# … with 1 more variable: FATALS <dbl>
Here is the code I have that runs through the process I am doing. I don't see how I can change it so that each value is a rate (values like peds or type_int)
#Libraries
rm(list=ls()) # this is to clear anything in memory
library(leaflet)
library(tidyverse)
library(ggmap)
library(leaflet.extras)
library(htmltools)
library(ggplot2)
library(maps)
library(mapproj)
library(mapdata)
library(zoo)
library(tsibble)
setwd("~/Desktop/Statistics790/DataSets/FARS2019NationalCSV")
df <- read.csv("accident.csv")
state <- unique(df$STATE)
for(i in state){
df1<- df %>%
filter(STATE==i) %>%
dplyr::select(c(STATE,PEDS,DAY,MONTH,YEAR,TYP_INT)) %>%
mutate(date = as.Date(as.character(paste(YEAR, MONTH, DAY, sep = "-"),"%Y-%m-%d"))) %>% # create a date
group_by(date) %>% # Group by State id and date
# summarise_at(.vars = vars(PEDS), sum)
summarise(pedday=sum(PEDS),intersection=mean(TYP_INT))
#ts1<-ts(df,start=c(2019,1,1), frequency=365)
setwd("~/Desktop/Statistics790/States_ts/figures")
plots<-df1 %>%
ggplot()+
geom_line(aes(x=date,y=pedday))+ylim(0,13)+
theme_bw()
ggsave(paste0("state_",i,".png"),width=8,height=6, )
ts1<-ts(df1,start=c(2019,1,1), frequency=365)
setwd("~/Desktop/Statistics790/States_ts")
ts1 %>% write.csv(paste0("state_",i,".csv"),row.names = F)
#Plots
}
#date1<- as.character(df$date)
#df1<- df%>% filter(STATE=="1")
#ts2<-xts(df,order.by = as.Date(df$date,"%Y-%m-%d"))
setwd("~/Desktop/Statistics790/States_ts")
cat("\f")
#df <- read.csv(paste0("state_1.csv"))
#print("------Linear Model------")
#summary(lm(pedday~weather,data=df))
for(i in state){
print(paste0("-------------------------Analysis for State: ",i," -------------------------------"))
df <- read.csv(paste0("state_",i,".csv"))
print("------Linear Model------")
print(summary(lm(pedday~intersection,data=df)))
}
Collating my answers from the comments: you need to get state population data from an outside source such as the US Census https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html#par_textimage_1574439295, read it in, join it to your dataset, and then calculate rate as pedestrians per population, scaled for ease of reading on the graph. You can make your code faster by taking some of your calculations out of the loop. The code below assumes the census data is called 'census.csv' and has columns 'Geographic Area' for state and 'X2019' for the most recent population data available.
pop <- read.csv('census.csv')
df <- read.csv('accidents.csv') %>%
left_join(pop, by = c('STATENAME' = 'Geographic Area') %>%
mutate(rate = (PEDS / X2019) * <scale>) %>%
mutate(date = as.Date(as.character(paste(YEAR, MONTH, DAY, sep = "-"),"%Y-%m-%d")))
The left_join will match state names and give each row a population value depending on its state, regardless of how many rows there are.

Time series forecasting in R; plotting "events" and generating new forecasting plots with specified date range after initial forecast

I have created a function which allows me to carry out time series forecasting using the fable package. The idea of the function was to analyse observed vs predicted values after a particular date/event. Here is a mock data frame which generates a column of dates:-
set.seed(1)
df <- data.frame(Date = sort(sample(seq(as.Date('2018/01/01'), as.Date('2020/09/17'), by="day"),1368883, replace = T)))
And here is the function I created. You specify the data, then the date of the event, then the forecast period in days and lastly a title for your graph.
event_analysis<-function(data,eventdate,period,title){
require(dplyr)
require(tsibble)
require(fable)
require(fabletools)
require(imputeTS)
require(ggplot2)
data_count<-data%>%
group_by(Date)%>%
summarise(Count=n())
data_count<-as_tsibble(data_count)
data_count<-na_mean(data_count)
train <- data_count %>%
#sample_frac(0.8)
filter(Date<=as.Date(eventdate))
fit <- train %>%
model(
ets = ETS(Count),
arima = ARIMA(Count),
snaive = SNAIVE(Count)
) %>%
mutate(mixed = (ets + arima + snaive) / 3)
fc <- fit %>% forecast(h = period)
forecastplot<-fc %>%
autoplot(data_count, level = NULL)+ggtitle(title)+
geom_vline(xintercept = as.Date(eventdate),linetype="dashed",color="red")+
labs(caption = "Red dashed line = Event occurrence")
fc_accuracy<-accuracy(fc,data_count)
#obs<-data_count
#colnames(obs)[2]<-"Observed"
#obs_pred<-merge(data_count,fc_accuracy, by="Date")
return(list(forecastplot,fc_accuracy,fc))
}
And in one run, I specify the df, the date of the event, the number of days that I want to forecast (3 weeks), then the title:-
event_analysis(df, "2020-01-01",21,"Event forecast")
Which will print this outcome and plot:-
I concede that the mock data I made isn't totally ideal but the function works well on my real-world data.
Here is what I want to achieve. I would like this output that has been made to come out of the function, but in addition, I would like an additional graph which "zooms in" on the period that has been forecasted, for 2 reasons:-
for ease of interpretation
I want to be able to see the N number of days before and N number of days after the event date (N representing the forecast period i.e. 21).
So, an additional graph (along with the original full forecast) that would look like this, perhaps in the one output, "multiplot" style:-
The other thing would be to print another output which shows the observed values in the test set against the predicted values from the models used in the forecasting.
These are basically the two additional things I want to add to my function but I am not sure how to go about this. Any help is massively appreciated :) .
I suppose you could rewrite it this way. I made a couple of adjustments to help you out.
set.seed(1)
df <- data.frame(Date = sort(sample(seq(as.Date('2018/01/01'), as.Date('2020/09/17'), by="day"),1368883, replace = T)))
event_analysis <- function(data, eventdate, period, title){
# in the future, you may think to move them out
library(dplyr)
library(tsibble)
library(fable)
library(fabletools)
library(imputeTS)
library(ggplot2)
# convert at the beginning
eventdate <- as.Date(eventdate)
# more compact sintax
data_count <- count(data, Date, name = "Count")
# better specify the date variable to avoid the message
data_count <- as_tsibble(data_count, index = Date)
# you need to complete missing dates, just in case
data_count <- tsibble::fill_gaps(data_count)
data_count <- na_mean(data_count)
train <- data_count %>%
filter(Date <= eventdate)
test <- data_count %>%
filter(Date > eventdate, Date <= (eventdate+period))
fit <- train %>%
model(
ets = ETS(Count),
arima = ARIMA(Count),
snaive = SNAIVE(Count)
) %>%
mutate(mixed = (ets + arima + snaive) / 3)
fc <- fit %>% forecast(h = period)
# your plot
forecastplot <- fc %>%
autoplot(data_count, level = NULL) +
ggtitle(title) +
geom_vline(xintercept = as.Date(eventdate), linetype = "dashed", color = "red") +
labs(caption = "Red dashed line = Event occurrence")
# plot just forecast and test
zoomfcstplot <- autoplot(fc) + autolayer(test, .vars = Count)
fc_accuracy <- accuracy(fc,data_count)
### EDIT: ###
# results vs test
res <- fc %>%
as_tibble() %>%
select(-Count) %>%
tidyr::pivot_wider(names_from = .model, values_from = .mean) %>%
inner_join(test, by = "Date")
##############
return(list(forecastplot = forecastplot,
zoomplot = zoomfcstplot,
accuracy = fc_accuracy,
forecast = fc,
results = res))
}
event_analysis(df,
eventdate = "2020-01-01",
period = 21,
title = "Event forecast")
Output:
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Carico il pacchetto richiesto: fabletools
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
#> $forecastplot
#>
#> $zoomplot
#>
#> $accuracy
#> # A tibble: 4 x 9
#> .model .type ME RMSE MAE MPE MAPE MASE ACF1
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 arima Test -16.8 41.8 35.2 -1.31 2.61 0.791 0.164
#> 2 ets Test -16.8 41.8 35.2 -1.31 2.61 0.791 0.164
#> 3 mixed Test -21.9 44.7 36.8 -1.68 2.73 0.825 -0.0682
#> 4 snaive Test -32.1 57.3 46.6 -2.43 3.45 1.05 -0.377
#>
#> $forecast
#> # A fable: 84 x 4 [1D]
#> # Key: .model [4]
#> .model Date Count .mean
#> <chr> <date> <dist> <dbl>
#> 1 ets 2020-01-02 N(1383, 1505) 1383.
#> 2 ets 2020-01-03 N(1383, 1505) 1383.
#> 3 ets 2020-01-04 N(1383, 1505) 1383.
#> 4 ets 2020-01-05 N(1383, 1505) 1383.
#> 5 ets 2020-01-06 N(1383, 1505) 1383.
#> 6 ets 2020-01-07 N(1383, 1505) 1383.
#> 7 ets 2020-01-08 N(1383, 1505) 1383.
#> 8 ets 2020-01-09 N(1383, 1505) 1383.
#> 9 ets 2020-01-10 N(1383, 1505) 1383.
#> 10 ets 2020-01-11 N(1383, 1505) 1383.
#> # ... with 74 more rows
#>
#> $results
#> # A tibble: 21 x 6
#> Date ets arima snaive mixed Count
#> <date> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 2020-01-02 1383. 1383. 1386 1384. 1350
#> 2 2020-01-03 1383. 1383. 1366 1377. 1398
#> 3 2020-01-04 1383. 1383. 1426 1397. 1357
#> 4 2020-01-05 1383. 1383. 1398 1388. 1415
#> 5 2020-01-06 1383. 1383. 1431 1399. 1399
#> 6 2020-01-07 1383. 1383. 1431 1399. 1346
#> 7 2020-01-08 1383. 1383. 1350 1372. 1299
#> 8 2020-01-09 1383. 1383. 1386 1384. 1303
#> 9 2020-01-10 1383. 1383. 1366 1377. 1365
#> 10 2020-01-11 1383. 1383. 1426 1397. 1328
#> # ... with 11 more rows

loop to run model on subset dataframe

I am not very experienced with loops so I am not sure where I went wrong here...
I have a dataframe that looks like:
month year day mean.temp mean.temp.year.month
1 1961 1 4.85 4.090323
1 1961 2 4.90 4.090323
1 1961 3 2.95 4.090323
1 1961 4 3.40 4.090323
1 1961 5 2.90 4.090323
dataset showing 3 months for 2 years can be found here:
https://drive.google.com/file/d/1w7NVeoEh8b7cAkU3cu1sXx6yCh75Inqg/view?usp=sharing
and I want to subset this dataframe by year and month so that I can run one nls model per year and month. Since my dataset contains 56 years (and each year has 12 months), that will give 672 models. Then I want to store the parameter estimates in a separate table.
I've created this code, but I can't work out why it is only giving me the parameter estimates for month 12 (all 56 years, but just month 12):
table <- matrix(99999, nrow=672, ncol=4)
YEARMONTHsel <- unique(df_weather[c("year", "month")])
YEARsel <- unique(df_weather$year)
MONTHsel <- unique(df_weather$month)
for (i in 1:length(YEARsel)) {
for (j in 1:length(MONTHsel)) {
temp2 <- df_weather[df_weather$year==YEARsel[i] & df_weather$month==MONTHsel[j],]
mn <- nls(mean.temp~mean.temp.year.month+alpha*sin(day*pi*2/30+phi),
data = temp2, control=nlc,
start=list(alpha=-6.07043, phi = -10))
cr <- as.vector(coef(mn))
nv <-length(coef(mn))
table[i,1:nv] <- cr
table[i,nv+1]<- YEARsel[i]
table[i,nv+2]<- MONTHsel[j]
}
}
I've tried several options (i.e. without using nested loop) but I'm not getting anywhere.
Any help would be greatly appreciated!Thanks.
Based on your loop, it looks like you want to run the regression grouped by year and month and then extract the coefficients in a new dataframe (correct me if thats wrong)
library(readxl)
library(tidyverse)
df <- read_excel("~/Downloads/df_weather.xlsx")
df %>% nest(-month, -year) %>%
mutate(model = map(data, ~nls(mean.temp~mean.temp.year.month+alpha*sin(day*pi*2/30+phi),
data = .x, control= "nlc",
start=list(alpha=-6.07043, phi = -10))),
coeff = map(model, ~coefficients(.x))) %>%
unnest(coeff %>% map(broom::tidy)) %>%
spread(names, x) %>%
arrange(year)
#> # A tibble: 6 x 4
#> month year alpha phi
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1961 0.561 -10.8
#> 2 2 1961 -1.50 -10.5
#> 3 3 1961 -2.06 -9.77
#> 4 1 1962 -3.35 -5.48
#> 5 2 1962 -2.27 -9.97
#> 6 3 1962 0.959 -10.8
First we nest the data based on your groups (in this case year and month), then we map the model for each group, then we map the coefficients for each group, lastly we unnest the coefficients and spread the data from long to wide.

Extend data frame column with inflation in R

I'm trying to extend some code to be able to:
1) read in a vector of prices
2) left join that vector of prices to a data frame of years (or years and months)
3) append/fill the prices for missing years with interpolated data based on the last year of available prices plus a specified inflation rate. Consider an example like this one:
prices <- data.frame(year=2018:2022,
wti=c(75,80,90,NA,NA),
brent=c(80,85,94,93,NA))
What I need is something that will fill the missing rows of each column with the last price plus inflation (suppose 2%). I can do this in a pretty brute force way as:
i_rate<-0.02
for(i in c(1:nrow(prices))){
if(is.na(prices$wti[i]))
prices$wti[i]<-prices$wti[i-1]*(1+i_rate)
if(is.na(prices$brent[i]))
prices$brent[i]<-prices$brent[i-1]*(1+i_rate)
}
It seems to me there should be a way to do this using some combination of apply() and/or fill() but I can't seem to make it work.
Any help would be much appreciated.
As noted by #camille, the problem with dplyr::lag is that it doesn't work here with consecutive NAs because it uses the "original" ith element of a vector instead of the "revised" ith element. We'd have to first create a version of lag that will do this by creating a new function:
impute_inflation <- function(x, rate) {
output <- x
y <- rep(NA, length = length(x)) #Creating an empty vector to fill in with the loop. This makes R faster to run for vectors with a large number of elements.
for (i in seq_len(length(output))) {
if (i == 1) {
y[i] <- output[i] #To avoid an error attempting to use the 0th element.
} else {
y[i] <- output[i - 1]
}
if (is.na(output[i])) {
output[i] <- y[i] * (1 + rate)
} else {
output[i]
}
}
output
}
Then it's a pinch to apply this across a bunch of variables with dplyr::mutate_at():
library(dplyr)
mutate_at(prices, vars(wti, brent), impute_inflation, 0.02)
year wti brent
1 2018 75.000 80.00
2 2019 80.000 85.00
3 2020 90.000 94.00
4 2021 91.800 93.00
5 2022 93.636 94.86
You can use dplyr::lag to get the previous value in a given column. Your lagged values look like this:
library(dplyr)
inflation_factor <- 1.02
prices <- data_frame(year=2018:2022,
wti=c(75,80,90,NA,NA),
brent=c(80,85,94,93,NA)) %>%
mutate_at(vars(wti, brent), as.numeric)
prices %>%
mutate(prev_wti = lag(wti))
#> # A tibble: 5 x 4
#> year wti brent prev_wti
#> <int> <dbl> <dbl> <dbl>
#> 1 2018 75 80 NA
#> 2 2019 80 85 75
#> 3 2020 90 94 80
#> 4 2021 NA 93 90
#> 5 2022 NA NA NA
When a value is NA, multiply the lagged value by the inflation factor. As you can see, that doesn't handle consecutive NAs, however.
prices %>%
mutate(wti = ifelse(is.na(wti), lag(wti) * inflation_factor, wti),
brent = ifelse(is.na(brent), lag(brent) * inflation_factor, brent))
#> # A tibble: 5 x 3
#> year wti brent
#> <int> <dbl> <dbl>
#> 1 2018 75 80
#> 2 2019 80 85
#> 3 2020 90 94
#> 4 2021 91.8 93
#> 5 2022 NA 94.9
Or to scale this and avoid doing the same multiplication over and over, gather the data into a long format, get lags within each group (wti, brent, or any others you may have), and adjust values as needed. Then you can spread back to the original shape:
prices %>%
tidyr::gather(key = key, value = value, wti, brent) %>%
group_by(key) %>%
mutate(value = ifelse(is.na(value), lag(value) * inflation_factor, value)) %>%
tidyr::spread(key = key, value = value)
#> # A tibble: 5 x 3
#> year brent wti
#> <int> <dbl> <dbl>
#> 1 2018 80 75
#> 2 2019 85 80
#> 3 2020 94 90
#> 4 2021 93 91.8
#> 5 2022 94.9 NA
Created on 2018-07-12 by the reprex package (v0.2.0).

Resources