I'm struggling to understand exactly how to compute a deflation factor for wages in a panel based on inflation.
I've teh R example below to help me illustrate the issue.
In Wooldridge (2009:452) Introductory Econometrics, 5th ed., he creates a deflation factor by dividing 107.6 by 65.2, i.e. 107.6/65.2 ≈ 1.65, but I can't figure out to to apply this to my own panel data. Wooldridge only mentions the deflation factor in passing.
Say I have a mini panel with two people, Jane and Tom, staring from 2006/2009 and running until 2015 with their yearly wage,
# install.packages(c("dplyr"), dependencies = TRUE)
library(dplyr)
set.seed(2)
tbl <- tibble(id = rep(c('Jane', 'Tom'), c(7, 10)),
yr = c(2009:2015, 2006:2015),
wg = c(rnorm(7, mean=5.1*10^4, sd=9), rnorm(10, 4*10^4, 12))
); tbl
#> A tibble: 17 x 3
#> id yr wg
#> <chr> <int> <dbl>
#> 1 Jane 2009 50991.93
#> 2 Jane 2010 51001.66
#> 3 Jane 2011 51014.29
#> 4 Jane 2012 50989.83
#> 5 Jane 2013 50999.28
#> 6 Jane 2014 51001.19
#> 7 Jane 2015 51006.37
#> 8 Tom 2006 39997.12
#> 9 Tom 2007 40023.81
#> 10 Tom 2008 39998.33
#> 11 Tom 2009 40005.01
#> 12 Tom 2010 40011.78
#> 13 Tom 2011 39995.29
#> 14 Tom 2012 39987.52
#> 15 Tom 2013 40021.39
#> 16 Tom 2014 39972.27
#> 17 Tom 2015 40010.54
I now get the consumer price index (CPI) (using this answer)
# install.packages(c("Quandl"), dependencies = TRUE)
CPI00to16 <- Quandl::Quandl("FRED/CPIAUCSL", collapse="annual",
start_date="2000-01-01", end_date="2016-01-01")
as_tibble(CPI00to16)
#> # A tibble: 17 x 2
#> Date Value
#> <date> <dbl>
#> 1 2016-12-31 238.106
#> 2 2015-12-31 237.846
#> 3 2014-12-31 236.290
#> 4 2013-12-31 234.723
#> 5 2012-12-31 231.221
#> 6 2011-12-31 227.223
#> 7 2010-12-31 220.472
#> 8 2009-12-31 217.347
#> 9 2008-12-31 211.398
#> 10 2007-12-31 211.445
#> 11 2006-12-31 203.100
#> 12 2005-12-31 198.100
#> 13 2004-12-31 191.700
#> 14 2003-12-31 185.500
#> 15 2002-12-31 181.800
#> 16 2001-12-31 177.400
#> 17 2000-12-31 174.600
my question is how do I deflate Jane and Tom's wages cf. Wooldridge 2009 selecting 2015 as the baseline year?
update; following MrSmithGoesToWashington’s comment below.
CPI00to16$yr <- as.numeric(format(CPI00to16$Date,'%Y'))
CPI00to16 <- mutate(CPI00to16, deflation_factor = CPI00to16[2,2]/Value)
df <- tbl %>% inner_join(as_tibble(CPI00to16[,3:4]), by = "yr")
df <- mutate(df, wg_defl = deflation_factor*wg, wg_diff = wg_defl-wg)
df
#> # A tibble: 17 x 6
#> id yr wg deflation_factor wg_defl wg_diff
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Jane 2009 50991.93 1.094315 55801.21 4809.2844
#> 2 Jane 2010 51001.66 1.078804 55020.78 4019.1176
#> 3 Jane 2011 51014.29 1.046751 53399.28 2384.9910
#> 4 Jane 2012 50989.83 1.028652 52450.80 1460.9728
#> 5 Jane 2013 50999.28 1.013305 51677.83 678.5477
#> 6 Jane 2014 51001.19 1.006585 51337.04 335.8494
#> 7 Jane 2015 51006.37 1.000000 51006.37 0.0000
#> 8 Tom 2006 39997.12 1.171078 46839.76 6842.6394
#> 9 Tom 2007 40023.81 1.124860 45021.18 4997.3691
#> 10 Tom 2008 39998.33 1.125110 45002.53 5004.1909
#> 11 Tom 2009 40005.01 1.094315 43778.07 3773.0575
#> 12 Tom 2010 40011.78 1.078804 43164.86 3153.0747
#> 13 Tom 2011 39995.29 1.046751 41865.12 1869.8369
#> 14 Tom 2012 39987.52 1.028652 41133.26 1145.7322
#> 15 Tom 2013 40021.39 1.013305 40553.87 532.4863
#> 16 Tom 2014 39972.27 1.006585 40235.49 263.2225
#> 17 Tom 2015 40010.54 1.000000 40010.54 0.0000
Related
x = list(data.frame(age = c(1:4),period = c(2000:2003)),
data.frame(age = c(5:8),period = c(1998:2001)),
data.frame(age = c(11:19),period = c(1990:1998)))
map2(x, x$period, ~cbind(.x, difference = .y-.x$age))
result:
> map2(x, x$period, ~cbind(.x, difference = .y-.x$age))
list()
Is it possible to map the function by using the elements within the same dataframe?
In your context x$period is NULL since x is the list of dataframes and it has no attribute "period". I think you want to access the period column within each unnammed dataframe in the list. I would just use map which will pass along each dataframe in the list, which you can then manipulate in the function to access each column without having to explicitly pass it.
library(purrr)
library(dplyr)
x = list(data.frame(age = c(1:4),period = c(2000:2003)),
data.frame(age = c(5:8),period = c(1998:2001)),
data.frame(age = c(11:19),period = c(1990:1998)))
#Original attempt
result <- map2(x, x$period, ~cbind(.x, difference = .y-.x$age))
result
#> list()
#My solution
result2 <- map(x, function(df) cbind(df, difference = df$period - df$age))
result2
#> [[1]]
#> age period difference
#> 1 1 2000 1999
#> 2 2 2001 1999
#> 3 3 2002 1999
#> 4 4 2003 1999
#>
#> [[2]]
#> age period difference
#> 1 5 1998 1993
#> 2 6 1999 1993
#> 3 7 2000 1993
#> 4 8 2001 1993
#>
#> [[3]]
#> age period difference
#> 1 11 1990 1979
#> 2 12 1991 1979
#> 3 13 1992 1979
#> 4 14 1993 1979
#> 5 15 1994 1979
#> 6 16 1995 1979
#> 7 17 1996 1979
#> 8 18 1997 1979
#> 9 19 1998 1979
#A more readable solution using dplyr
result3 <- map(x, function(df) df %>% mutate(difference = period - age))
result3
#> [[1]]
#> age period difference
#> 1 1 2000 1999
#> 2 2 2001 1999
#> 3 3 2002 1999
#> 4 4 2003 1999
#>
#> [[2]]
#> age period difference
#> 1 5 1998 1993
#> 2 6 1999 1993
#> 3 7 2000 1993
#> 4 8 2001 1993
#>
#> [[3]]
#> age period difference
#> 1 11 1990 1979
#> 2 12 1991 1979
#> 3 13 1992 1979
#> 4 14 1993 1979
#> 5 15 1994 1979
#> 6 16 1995 1979
#> 7 17 1996 1979
#> 8 18 1997 1979
#> 9 19 1998 1979
Created on 2023-02-02 with reprex v2.0.2
I'm new to R and have found similar solutions to my problem, but I'm struggling to apply these to my code. Please help...
These data are simplified, as the id variables are many:
df = data.frame(id = rep(c("a_10", "a_11", "b_10", "b_11"), each = 5),
site = rep(1:5, 4),
value = sample(1:20))
The aim is to add another column labelled "year" with values that are grouped by "id" but the true names are many - so I'm trying to simplify the code by using the ending digits.
I can use dplyr to split the dataframe into each id variable using this code (repeated for each id variable):
df %>%
select(site, id, value) %>%
filter(grepl("10$", id)) %>%
mutate(Year = "2010")`
Rather than using merge to re-combine the dataframes back into one, is there not a more simple method?
I tried modifying case_when with mutate as described in a previous answer:
[https://stackoverflow.com/a/63043920/12313457][1]
mutate(year = case_when(grepl(c("10$", "11$", id) == c("2010", "2011"))))
is something like this possible??
Thanks in advance
In case your id column has different string lengths you can use sub:
df %>%
mutate(Year = paste0("20", sub('^.*_(\\d+)$', '\\1', id)))
#> id site value Year
#> 1 a_10 1 2 2010
#> 2 a_10 2 7 2010
#> 3 a_10 3 16 2010
#> 4 a_10 4 10 2010
#> 5 a_10 5 11 2010
#> 6 a_11 1 5 2011
#> 7 a_11 2 13 2011
#> 8 a_11 3 14 2011
#> 9 a_11 4 6 2011
#> 10 a_11 5 12 2011
#> 11 b_10 1 17 2010
#> 12 b_10 2 1 2010
#> 13 b_10 3 4 2010
#> 14 b_10 4 15 2010
#> 15 b_10 5 9 2010
#> 16 b_11 1 8 2011
#> 17 b_11 2 20 2011
#> 18 b_11 3 19 2011
#> 19 b_11 4 18 2011
#> 20 b_11 5 3 2011
Created on 2022-04-21 by the reprex package (v2.0.1)
You can use substr to get the final two digits of id and then paste0 this to "20" to recreate the year.
df |> dplyr::mutate(Year = paste0("20", substr(id, 3, 4)))
#> id site value Year
#> 1 a_10 1 5 2010
#> 2 a_10 2 12 2010
#> 3 a_10 3 9 2010
#> 4 a_10 4 7 2010
#> 5 a_10 5 13 2010
#> 6 a_11 1 3 2011
#> 7 a_11 2 4 2011
#> 8 a_11 3 16 2011
#> 9 a_11 4 2 2011
#> 10 a_11 5 6 2011
#> 11 b_10 1 19 2010
#> 12 b_10 2 14 2010
#> 13 b_10 3 15 2010
#> 14 b_10 4 10 2010
#> 15 b_10 5 11 2010
#> 16 b_11 1 18 2011
#> 17 b_11 2 1 2011
#> 18 b_11 3 20 2011
#> 19 b_11 4 17 2011
#> 20 b_11 5 8 2011
Created on 2022-04-21 by the reprex package (v2.0.1)
I have a data.frame which looks like so:
df <- data.frame(id=c("001","002","003","004"),year=c(2015,2015,2015,2015),
x1=c(15,20,25,30),x2=c(1,2,3,4))
id year x1 x2
001 2015 15 1
002 2015 20 2
003 2015 25 3
004 2015 30 4
I would like to duplicate id, x1, and x2 but change the year to end up with a data.frame that resembles the following:
id year x1 x2
001 2015 15 1
002 2015 20 2
003 2015 25 3
004 2015 30 4
001 2016 15 1
002 2016 20 2
003 2016 25 3
004 2016 30 4
I can achieve this by doing
df2 <- df %>%
mutate(year = 2016)
df3 <- rbind(df, df2)
But I am wondering if there is a more intuitive way, so that I can create duplicates for 20+ years without needing to make multiple new data.frames?
df <- data.frame(id=c("001","002","003","004"),year=c(2015,2015,2015,2015),
x1=c(15,20,25,30),x2=c(1,2,3,4))
library(tidyr)
df %>% complete(nesting(id, x1, x2), year = 2015:2016)
#> # A tibble: 8 x 4
#> id x1 x2 year
#> <chr> <dbl> <dbl> <dbl>
#> 1 001 15 1 2015
#> 2 001 15 1 2016
#> 3 002 20 2 2015
#> 4 002 20 2 2016
#> 5 003 25 3 2015
#> 6 003 25 3 2016
#> 7 004 30 4 2015
#> 8 004 30 4 2016
For extra years you just need to change 2015:2016 according to your need. You may also use dynamic referencing here using seq
library(tidyverse)
df <- data.frame(id=c("001","002","003","004"),year=c(2015,2015,2015,2015),
x1=c(15,20,25,30),x2=c(1,2,3,4))
map_dfr(0:1, ~mutate(df, year = year + .x))
#> id year x1 x2
#> 1 001 2015 15 1
#> 2 002 2015 20 2
#> 3 003 2015 25 3
#> 4 004 2015 30 4
#> 5 001 2016 15 1
#> 6 002 2016 20 2
#> 7 003 2016 25 3
#> 8 004 2016 30 4
Created on 2021-06-16 by the reprex package (v2.0.0)
This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 2 years ago.
I use dplyr's filter() function all the time for tidying my data. Today it has stopped working when using the | operator. I am certain I have been able to use the | to filter any observation that meets any of the criteria separated by the | but it isn't working all of a sudden. Any help/guidance is greatly appreciated, as always. Reprex is below.
library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.6.2
#> Warning: package 'tidyr' was built under R version 3.6.2
#> Warning: package 'purrr' was built under R version 3.6.2
id <- c(1:20)
YEAR <- c(2009,2009,2009,2009,2010,2010,2010,2010,2011,2011,2011,2011,2012,2012,2012,2012,2013,2013,2013,2013)
df1 <- data.frame(id,YEAR)
df1
#> id YEAR
#> 1 1 2009
#> 2 2 2009
#> 3 3 2009
#> 4 4 2009
#> 5 5 2010
#> 6 6 2010
#> 7 7 2010
#> 8 8 2010
#> 9 9 2011
#> 10 10 2011
#> 11 11 2011
#> 12 12 2011
#> 13 13 2012
#> 14 14 2012
#> 15 15 2012
#> 16 16 2012
#> 17 17 2013
#> 18 18 2013
#> 19 19 2013
#> 20 20 2013
df1 <- df1 %>% dplyr::filter(YEAR == 2009|2010)
df1
#> id YEAR
#> 1 1 2009
#> 2 2 2009
#> 3 3 2009
#> 4 4 2009
#> 5 5 2010
#> 6 6 2010
#> 7 7 2010
#> 8 8 2010
#> 9 9 2011
#> 10 10 2011
#> 11 11 2011
#> 12 12 2011
#> 13 13 2012
#> 14 14 2012
#> 15 15 2012
#> 16 16 2012
#> 17 17 2013
#> 18 18 2013
#> 19 19 2013
#> 20 20 2013
Expected results would be:
df1 <- df1 %>% dplyr::filter(YEAR == 2009|2010)
df1
#> id YEAR
#> 1 1 2009
#> 2 2 2009
#> 3 3 2009
#> 4 4 2009
#> 5 5 2010
#> 6 6 2010
#> 7 7 2010
#> 8 8 2010
The following works filtering on a single condition:
df1 <- df1 %>% dplyr::filter(YEAR == 2009)
df1
#> id YEAR
#> 1 1 2009
#> 2 2 2009
#> 3 3 2009
#> 4 4 2009
We can use %in% instead of == for more than one element
library(dplyr)
df1 %>%
dplyr::filter(YEAR %in% c(2009, 2010))
With |, we need to repeat
df1 %>%
dplyr::filter(YEAR == 2009|YEAR == 2010)
Any value greater than 0 with another, gives TRUE
2019|2020
#[1] TRUE
0|0
#[1] FALSE
I think also your way would work with...
df1 <- df1 %>% dplyr::filter(YEAR == 2009|YEAR == 2010)
I think of it as two separate arguments. If you use each individually, the filter would work. In your provided YEAR == 2009|2010, the second part would simply be filter(2010), which doesn't make sense.
Using the dplyr full_join() operation, I am trying to perform the equivalent of a basic merge() operation in which no common variables exist (unable to satisfy the "by=" argument). This will blend two data frames and return all possible combinations.
However, the current full_join() function requires a common variable. I am unable to locate another dplyr function that can help with this. How can I perform this operation using functions specific to the dplyr library?
df_a = data.frame(department=c(1,2,3,4))
df_b = data.frame(period=c(2014,2015,2016,2017))
#This works as desired
big_df = merge(df_a,df_b)
#I'd like to perform the following in a much bigger operation:
big_df = dplyr::full_join(df_a,df_b)
#Error: No common variables. Please specify `by` param.
You can use crossing from tidyr:
crossing(df_a,df_b)
department period
1 1 2014
2 1 2015
3 1 2016
4 1 2017
5 2 2014
6 2 2015
7 2 2016
8 2 2017
9 3 2014
10 3 2015
11 3 2016
12 3 2017
13 4 2014
14 4 2015
15 4 2016
16 4 2017
If there are duplicate rows, crossing doesn't give the same result as merge.
Instead use full_join with by = character() to perform a cross-join which generates all combinations of df_a and df_b.
library("tidyverse") # version 1.3.2
# Add duplicate rows for illustration.
df_a <- tibble(department = c(1, 2, 3, 3))
df_b <- tibble(period = c(2014, 2015, 2016, 2017))
merge doesn't de-duplicate.
df_a_merge_b <- merge(df_a, df_b)
df_a_merge_b
#> department period
#> 1 1 2014
#> 2 2 2014
#> 3 3 2014
#> 4 3 2014
#> 5 1 2015
#> 6 2 2015
#> 7 3 2015
#> 8 3 2015
#> 9 1 2016
#> 10 2 2016
#> 11 3 2016
#> 12 3 2016
#> 13 1 2017
#> 14 2 2017
#> 15 3 2017
#> 16 3 2017
crossing drops duplicate rows.
df_a_crossing_b <- crossing(df_a, df_b)
df_a_crossing_b
#> # A tibble: 12 × 2
#> department period
#> <dbl> <dbl>
#> 1 1 2014
#> 2 1 2015
#> 3 1 2016
#> 4 1 2017
#> 5 2 2014
#> 6 2 2015
#> 7 2 2016
#> 8 2 2017
#> 9 3 2014
#> 10 3 2015
#> 11 3 2016
#> 12 3 2017
full_join doesn't remove duplicates either.
df_a_full_join_b <- full_join(df_a, df_b, by = character())
df_a_full_join_b
#> # A tibble: 16 × 2
#> department period
#> <dbl> <dbl>
#> 1 1 2014
#> 2 1 2015
#> 3 1 2016
#> 4 1 2017
#> 5 2 2014
#> 6 2 2015
#> 7 2 2016
#> 8 2 2017
#> 9 3 2014
#> 10 3 2015
#> 11 3 2016
#> 12 3 2017
#> 13 3 2014
#> 14 3 2015
#> 15 3 2016
#> 16 3 2017
packageVersion("tidyverse")
#> [1] '1.3.2'
Created on 2023-01-13 with reprex v2.0.2