R: Filtering data in plm - r

I have a pdata.frame for 14 years x 89 observations and 10 variables + 4 dummies.
Those dummies variables are only for filtering (when necessary) my data.
When using Stata, I just add an "if VAR==1" at the end of my code.
How to use this with plm package in R?
Examples
Stata code
quietly xtreg y x1 x2 if x3==1, fe
R code
plm( y ~ x1 + x2, data = PANEL, model = "within")
Must I create separate panels, already filtered data, or is it possible to do it while running plm?

You can either use the subset option in plm (subset=) or you subset the data before fitting it
Using the dataset from the package, subset on region ==6,
library(plm)
data("Produc", package = "plm")
fit1 = plm(gsp ~ hwy + pc, data = Produc, subset = region == 6)
fit2 = plm(gsp ~ hwy + pc, data = subset(Produc, region == 6))
identical(coefficients(fit1), coefficients(fit2))

Related

plotting an interaction term in moderated regression using MICE imputation

I'm using imputed data to test a series of regression models, including some moderation models.
Imputation
imp_data <- mice(data,m=20,maxit=20,meth='cart',seed=12345)
I then convert this to long format so I can recode / sum variables as needed, beore turning back to mids format
impdatlong_mids<-as.mids(impdat_long)
Example model:
model1 <- with(impdatlong_mids,
lm(Outcome ~ p1_sex + p2 + p3 + p4
+ p5+ p6+ p7+ p8+ p9+ p10
+ p11+ p1_sex*p12+ p1_sex*p13 + p14)
in non-imputed data, to create a graphic representation of the significant ineraction, I'd use (e.g.)
interact_plot (model=model1, pred = p1_sex, modx = p12)
This doesn't work with imputed data / mids objects.
Has anyone plotted an interaction using imputed data, and able to help or share examples?
Thanks
EDIT: Reproducible example
library(tidyverse)
library(interactions)
library(mice)
# library(reprex) does not work with this
set.seed(42)
options(warn=-1)
#---------------------------------------#
# Data preparations
# loading an editing data
d <- mtcars
d <- d %>% mutate_at(c('cyl','am'),factor)
# create missing data and impute it
mi_d <- d
nr_of_NAs <- 30
for (i in 1:nr_of_NAs) {
mi_d[sample(nrow(mi_d),1),sample(ncol(mi_d),1)] <- NA
}
mi_d <- mice(mi_d, m=2, maxit=2)
#---------------------------------------#
# regressions
#not imputed
lm_d <- lm(qsec ~ cyl*am + mpg*disp, data=d)
#imputed dataset
lm_mi <- with(mi_d,lm(qsec ~ cyl*am + mpg*disp))
lm_mi_pool <- pool(lm_mi)
#---------------------------------------#
# interaction plots
# not imputed
#continuous
interactions::interact_plot(lm_d, pred=mpg,modx=disp, interval=T,int.width=0.3)
#categorical
interactions::cat_plot(lm_d, pred = cyl, modx = am)
#---------------------------------------#
# interaction plots
# imputed
#continuous
interactions::interact_plot(lm_mi_pool, pred=mpg,modx=disp, interval=T,int.width=0.3)
# Error in model.frame.default(model) : object is not a matrix
#categorical
interactions::cat_plot(lm_mi_pool, pred = cyl, modx = am)
# Error in model.frame.default(model) : object is not a matrix
The problem seems to be that neither interact_plot, cat_plot or any other available package allows for (at least categorical) interaction plotting with objects of class mipo or pooled regression outputs.
I am using the walking data from the mice package as an example. One way to get the interaction plot (well version of one type of interaction plot) is to use the gtsummary package. Under the hood it will take the model1 use pool() from mice to average over the models and then use a combo of tbl_regression() and plot() to output a plot of the coefficients in the model. The tbl_regression() function is what is calling the pool() function.
library(mice)
library(dplyr)
library(gtsummary)
imp_data <- mice(mice::walking,m=20,maxit=20,meth='cart',seed=12345)
model1 <- with(imp_data,
lm(age ~ sex*YA))
model1 %>%
tbl_regression() %>%
plot()
The package emmeans allows you to extract interaction effects from a mira object. Here is a gentle introduction. After that, the interactions can be plotted with appropriate ggplot. This example is for the categorical variables but could be extended to the continous case - after the emmeans part things get relatively straighforward.
library(ggplot2)
library(ggstance)
library(emmeans)
library(khroma)
library(jtools)
lm_mi <- with(mi_d,lm(qsec ~ gear*carb))
#extracting interaction effects
emcatcat <- emmeans(lm_mi, ~gear*carb)
tidy <- as_tibble(emcatcat)
#plotting
pd <- position_dodge(0.5)
ggplot(tidy, aes(y=gear, x=emmean, colour=carb)) +
geom_linerangeh(aes(xmin=lower.CL, xmax=upper.CL), position=pd,size = 2) +
geom_point(position=pd,size = 4)+
ggtitle('Interactions') +
labs (x = "aggreageted interaction effect") +
scale_color_bright() +
theme_nice()
this can be extended to a three-way interaction plot with facet_grid as long as you have a third categorical interaction term.

Error in model.frame.default: variable lengths differ, R predict function

This is not a new question, I have seen several proposed solutions elsewhere and have tried them, none works, so I ask.
How can I fix this error? I am using R version 3.5.3 (2019-03-11)
Error in model.frame.default(data = ov_val, formula = Surv(time = ov_dev$futime, : variable lengths differ (found for 'rx')
Here is a reproducible example:
library(survival)
library(survminer)
library(dplyr)
# Create fake development dataset
ov_dev <- ovarian[1:13,]
# Create fake validation dataset
ov_val <- ovarian[13:26,]
# Run cox model
fit.coxph <- coxph(Surv(time = ov_dev$futime, event = ov_dev$fustat) ~ rx + resid.ds + age + ecog.ps, data = ov_dev)
summary(fit.coxph)
# Where error occurs
p <- log(predict(fit.coxph, newdata = ov_val, type = "expected"))
I think this has happened because you have used ov_dev$futime and ov_dev$fustat in your model specification rather than just using futime and fustat. That means that when you come to predict, the model is using the ov_dev data for the dependent variable but ov_val for the independent variables, which are of different length (13 versus 14). Just remove the data frame prefix and trust the data parameter:
library(survival)
library(survminer)
library(dplyr)
# Create fake development dataset
ov_dev <- ovarian[1:13,]
# Create fake validation dataset
ov_val <- ovarian[13:26,]
# Run cox model
fit.coxph <- coxph(Surv(futime, fustat) ~ rx + resid.ds + age + ecog.ps,
data = ov_dev)
p <- log(predict(fit.coxph, newdata = ov_val, type = "expected"))
p
#> [1] 0.4272783 -0.1486577 -1.8988833 -1.1887086 -0.8849632 -1.3374428
#> [7] -1.2294725 -1.5021708 -0.3264792 0.5633839 -3.0457613 -2.2476071
#> [13] -1.6754877 -3.0691996
Created on 2020-08-19 by the reprex package (v0.3.0)

Stargazer one line per data set

I am running regressions using various subsets of a data set and a number of dependent variables.
An example using attitude data:
library(stargazer)
#REGRESSIONS USING DATASET 1
linear1.1 <- lm(rating ~ complaints, data = attitude) #dependent 1
linear1.2 <- lm(privileges ~ complaints, data = attitude) #dependent 2
#REGRESSIONS USING DATASET 2
linear2.1 <- lm(rating ~ complaints, data = attitude[1:15,]) #dependent 1
linear2.2 <- lm(privileges ~ complaints, data = attitude[1:15,]) #dependent 2
As you can see, both depdendent variables rating and privileges are used in regressions for both subsets of the data. Using a standard stargazer approach produces the following table:
stargazer::stargazer(linear1.1,linear1.2,linear2.1,linear2.2,
omit.stat = "all",
keep = "complaints")
Each column represents one of the regression models. However, I'd like to have each column represent one dependent variable. Each subset of the data should represent one row:
I have produced this table by hand. Does anyone know whether it's possible to achieve this using stargazer? I have a lot of regression subsets and dependent variables, so a highly automatic solution is appreciated. Thanks!
I just wonder if this little modification from this (Exporting output of custom multiple regressions from R to Latex) will suit you
library(stargazer)
library(broom)
## generate dummy data
set.seed(123)
x <- runif(1000)
z <- x^0.5
y <- x + z + rnorm(1000, sd=.05)
model1 <- lm(y ~ x)
model2 <- lm(y ~ z)
## transform model summaries into dataframes
tidy(model1) -> model1_tidy
tidy(model2) -> model2_tidy
output <- rbind(model1_tidy,model2_tidy)
stargazer(output, type='text', summary=FALSE)

Running Cox.ph model with GAMM mixed models in R

I am new in using GAM and splines. I am running a survival model in which I want to model the Time to event with the age of the subjects controlling by two variables. Here is the example using a conventional survival model with coxph:
library(survival)
fit_cox<-coxph(Surv(time, event)~ age+ var1 + var2, data=mydata)
I suspect that the relationship between var1 and var2 with the outcome is not linear and also I am thinking that I can include random effects in my model (moving to mixed effect models gamm).
I have tried this syntax:
library(mgcv)
fit_surv<-Surv(time, event)
fit_gam<-gam(fit_surv ~ age + s(var1) + s(var2), data = mydata, family = cox.ph())
And to include the random effects:
library(gamm4)
fit_gamm <- gamm4(fit_surv ~ age + s(var1) + s(var2), random = ~(1 | ID), data = mydata, family = cox.ph)
My problems are:
1. In fit_gam I do not know how to make a summary of this model and to see the coefficients table and plot the model. This error came to me:
summary(fit_gam)
"Error in Ops.Surv(w, object$y) : Invalid operation on a survival time"
In fit_gamm I could not run the model because some error in syntaxis is made or maybe I could not include a surv? The error is:
"Error in ncol(x) : object 'x' not found"
Thank you in advance!
As mentioned in the comments, simple gaussian frailties (gaussian random intercept) can be specified directly within the mgcv::gam call, e.g. by adding ... + s(ID, bs = "re") + ... to your formula (note that ID has to be a factor variable).
Alternatively, you can transform the data to the so called Piece-wise Exponential Data (PED) format and fit the model using any GA(M)M software, which are then called Piece-wise exponential Additive Mixed Models (PAMM). Here is an example:
library(coxme)
library(mgcv)
library(pammtools)
lung <- lung %>% mutate(inst = as.factor(inst)) %>% na.omit()
## cox model with gaussian frailty
cme <- coxme(Surv(time, status) ~ ph.ecog + (1|inst), data=lung)
## pamm with gaussian frailty
ped <- lung %>% as_ped(Surv(time, status)~., id="id")
pam <- gam(ped_status ~ s(tend) + ph.ecog + s(inst, bs = "re"),
data = ped, family = poisson(), offset = offset)
## visualize random effect:
gg_re(pam)
# compare coxme and pamm estimates:
re <- tidy_re(pam)
plot(cme$frail$inst, re$fit, las=1, xlab="Frailty (cox)", ylab="Frailty (PAM)")
abline(0, 1)
## with gamm4
library(gamm4)
#> Loading required package: Matrix
#> Loading required package: lme4
#>
#> Attaching package: 'lme4'
#> The following object is masked from 'package:nlme':
#>
#> lmList
#> This is gamm4 0.2-5
pam2 <- gamm4(ped_status ~ s(tend) + ph.ecog, random = ~(1|inst),
family = poisson(), offset = ped$offset, data = ped)
lattice::qqmath(ranef(pam2$mer)$inst[, 1])
Created on 2018-12-08 by the reprex package (v0.2.1)

Plot Effects of Variables in Interaction Terms

I would like to plot the effects of variables in interaction terms, using panel data and a FE model.
I have various interaction effects in my equation, for example this one here:
FIXED1 <- plm(GDPPCgrowth ~ FDI * PRIVCR, data = dfp)
I can only find solutions for lm, but not for plm.
So on the x-axis there should be PRIVCR and on the y-axis the effect of FDI on growth.
Thank you for your help!
Lisa
I am not aware of a package that supports plm objects directly. As you are asking for FE models, you can just take an LSDV approach for FE and do the estimation by lm to get an lm object which works with the effects package. Here is an example for the Grunfeld data:
library(plm)
library(effects)
data("Grunfeld", package = "plm")
mod_fe <- plm(inv ~ value + capital + value:capital, data = Grunfeld, model = "within")
Grunfeld[ , "firm"] <- factor(Grunfeld[ , "firm"]) # needs to be factor in the data NOT in the formula [required by package effects]
mod_lsdv <- lm(inv ~ value + capital + value:capital + firm, data = Grunfeld)
coefficients(mod_fe) # estimates are the same
coefficients(mod_lsdv) # estimates are the same
eff_obj <- effects::Effect(c("value", "capital"), mod_lsdv)
plot(eff_obj)

Resources