Panel data with binary dependent variable in R - r

Is it possible to do regressions in R using a panel data set with a binary dependent variable? I am familiar with using glm for logit and probit and plm for panel data, but am not sure how to combine the two. Are there any existing code examples?
EDIT
It would also be helpful if I could figure out how to extract the matrix that plm() is using when it does a regression. For instance, you could use plm to do fixed effects, or you could create a matrix with the appropriate dummy variables and then run that through glm(). In a case like this, however, it is annoying to generate the dummies yourself and it would be easier to have plm do it for you.

The package "pglm" might be what you need.
http://cran.r-project.org/web/packages/pglm/pglm.pdf
This package offers some functions of glm-like models for panel data.

Maybe the package lme4 is what you are looking for.
It seems to be possible to run generalized regressions with fixed effects using the comand glme.
But you should be aware that panel data with binary dependent variable is different than the usual linear models.
This site may be helpful.
Best regards,
Manoel

model.frame(plmmodel)
will give you the data frame that is actually used by plm for fitting the model (i.e. after list-wise deletion if you have NAs, etc.)
I don't think that plm has implemented functions to estimate models with binary outcomes, but I may be wrong. Check out the reference manual at: http://cran.r-project.org/web/packages/plm/index.html
If I'm right, this would suggest that you can't "combine the two" without considerable work in extending the functions provided by plm.

Related

R, mitools::MIcombine, what is the reason for no p-values?

I am currently running a simple linear regression model with 5 multiply imputed datasets in R.
E.g. model <- with(imp, lm(outcome ~ exposure))
To pool the summary estimates I could use the command summary(mitools::MIcombine(model)) from the mitools package. However, this does not give results for p-values. I could also use the command summary(pool(model)) from the mice package and this does give results for p-values.
Because of this, I am wondering if there is a specific reason why MIcombine does not produce p-values?
After looking through the documentation, it doesn't seem like there is a particular reason that the mitools library doesn't provide p-values. Although, the package's focus is on imputation, not model results.
However, you don't need either of these packages to see your results–along with the per model p-values. I started writing this as a comment but decided to include the code. If you weren't aware...you can use base R's summary. I realize that the output of mice is comparative, as is mitools. I thought it was important enough to mention this, as well.
If the output of your call is model, then this will work.
library(tidyverse)
map(1:length(model), ~summary(model[.x]))

Is there an R function for creating an interaction plot of a panelAR model?

In order to strengthen the interpretation of an interaction term I would like to create an interaction plot.
Starting Point: I am analyzing a panel data frame with which I fitted a feasible generalized least squares model by using the panelAR function. It includes an interaction term of two continuous variables.
What I want to do: To create an interaction plot, e.g. following the style of “plot_model” from the package sjPlot (see Three-Way-Interactions: link).
Problem: I could neither find any package which supports the type of my model nor a different way to get a plot.
Question: Is there any workaround which can be used for obtaining an interaction plot or even a package which supports a panelAR model?
Since I am quite new to R I would appreciate every kind of help. Thank you very much

Test for Multicollinearity in Panel Data R

I am running a panel data regression using the plm package in R and want to control for multicollinearity between the explanatory variables.
I know there is the vif() function in the car-package, however as far as I know, it cannot deal with panel data output.
The plm can do other diagnostics such as a unit root test but I found no method to calculate for multicollinearity.
Is there a way to calculate a similar test to vif, or can I just regard each variable as a time-series, leaving out the panel information and run tests using the car package?
I cannot disclose the data, but the problem should be relevant to all panel data models.
The dimension is roughly 1,000 observations, over 50 time-periods.
The code I use looks like this:
pdata <- plm.data(RegData, index=c("id","time"))
fixed <- plm(Y~X, data=pdata, model="within")
and then
vif(fixed)
returns an error.
Thank you in advance.
This question has been asked with reference to other statistical packages such as SAS https://communities.sas.com/thread/47675 and Stata http://www.stata.com/statalist/archive/2005-08/msg00018.html and the common answer has been to use pooled model to get VIF. The logic is that since multicollinearity is only about independent variable there is no need to control for individual effects using panel methods.
Here's some code extracted from another site:
mydata=read.csv("US Panel Data.csv")
attach(mydata) # not sure is that's really needed
Y=cbind(Return) # not sure what that is doing
pdata=plm.data(mydata, index=c("id","t"))
model=plm(Y ~ 1+ESG+Beta+Market.Cap+PTBV+Momentum+Dummy1+Dummy2+Dummy3+Dummy4+Dummy5+
Dummy6+Dummy7+Dummy8+Dummy9,
data=pdata,model="pooling")
vif(model)

How to do feature selection with randomForest package?

I'm using randomForest in order to find out the most significant variables. I was expecting some output that defines the accuracy of the model and also ranks the variables based on their importance. But I am a bit confused now. I tried randomForest and then ran importance() to extract the importance of variables.
But then I saw another command rfcv (Random Forest Cross-Valdidation for feature selection), which should be the most appropriate for this purpose I suppose, but the question I have regarding this is: how to get the list of the most important variables? How to see the output after running it? Which command to use?
Another thing: What is the difference between randomForest and predict.randomForest?
I am not very familiar with randomforest and R therefore any help would be appreciated.
Thank you in advance!
After you have made a randomForest model you use predict.randomForest to use the model you created on new data e.g. build a random forest with training data then run your validation data through that model with predict.randomForest.
As for the rfcv there is an option recursive which (from the help):
whether variable importance is (re-)assessed at each step of variable
reduction
Its all in the help file

fixed effect, instrumental variable regression like xtivreg in stata (FE IV regression)

Does anyone know about a R package that supports fixed effect, instrumental variable regression like xtivreg in stata (FE IV regression). Yes, I can just include dummy variables but that just gets impossible when the number of groups increases.
Thanks!
I can just include dummy variables but that just gets impossible when the number of groups increases
By "impossible," do you mean "computationally impossible"? If so, check out the plm package, which was designed to handle cases that would otherwise be computationally infeasible, and which permits fixed-effects IV.
Start with the plm vignette. It will quickly make clear whether plm is what you're looking for.
Update 2018 December 03: the estimatr package will also do what you want. It's faster and easier to use than the plm package.
As you may know, for many fixed effects and random effects models {I should mention FE and RE from econometrics and education standpoint since the definitions in statistics are different}, you can create an equivalent SEM (Structural Equation Modeling) model. There are two packages in R that can be used for that purpose: 1)SEM 2) LAVAAN
Another solution is to use SAS. In SAS, you can use Proc GLM which enables you to use "absorb" statement which automatically takes care of the dummies as well as finding (x - xbar) per each observation.
Hope it helps.
Try the ivreg command from the AER package.

Resources