I have unbalanced panel data and I want to fit this type of regression:
Pr(y=1|xB) = G(xB+a)
where "y" is a binary variable, "x" vector of explanatory variables and "B" my coeff.
I want to implement random effect model with maximum likelihood estimation, however I didn't understand what I need to change in the plm function (of package plm) CRAN guide (vignette). As far I used this code:
library(plm)
p_finale <- plm.data(p_finale, index=c("idnumber","Year"))
attach(p_finale)
y <- (TotalDebt_dummy)
X_tot <- cbind(Size,ln_Age,liquidity,Asset_Tangibility,profitability,growth, sd_cf_risk1, family_dummy,family_manager,
sd_cf_risk1*family_dummy,
Ateco_A,Ateco_C,Ateco_D,Ateco_E,Ateco_F,Ateco_G,Ateco_H,Ateco_I,Ateco_J,Ateco_M,Ateco_N,
Ateco_Q,Ateco_R)
model1 <- plm(y~X_tot+factor(Year),data = p_finale, model="random")
I included the whole code, but the only thing I believe needs to be changed is the last row in plm.
Function plm from package plm does not use a maximum-likelihood approach for model estimation. It uses a GLS approach as is common in econometrics.
Please see the section about plm versus nlme and lme4 in the package's first vignette ("Panel data econometrics with R: the plm package" (https://cran.rstudio.com/web/packages/plm/vignettes/A_plmPackage.html). The section explains the differences between the appraoches and has code examples for boths (and refers to packages nlme and lme4 for the maximum-likelihood approach).
Related
I'm trying to use predictions from a random survival forest computed using Ranger to calculate a c-index at specific time points. I know this can be done easily for a coxph model with the following code:
cox_model = coxph(Surv(time, status == 1) ~ ., data = train)
c_index_test <- pec::cindex(cox_model, formula = Cox_model$formula, data=test, eval.times= c(30, 90, 730))
#want to evaluate at 1 month, 3 months, and 2 years
However, although I can calculate a c-index at these time points easily with a random forest generated using rfsrc(), I haven't been able to do it using ranger.
In addition to the pec cindex() function (which doesn't work with objects of class "ranger", I've also tried the concordance.index function (part of the survcomp package) and tried different combinations of using the predict.ranger function to generate survival probability predictions, but nothing has worked.
If anyone can provide code as to how to calculate a the c-index of a ranger RSF (at specific time points and on an external validation set) I would appreciate it immensely!!! I've been able to do it with randomforestSRC but it just takes so long that often my R session will time out and I haven't actually been able to get ANY results with runs having >10 trees...
The ranger packages computes Harrell’s c-index, which is similar to the concordance statistic. If you have a fitted model rf, the attribute prediction.error is equivalent to 1 - Harrell's c-index. Have a look at the following link for more details.
I have created a regression model using ols() from rms package
data_Trans <- ols(Check ~ rcs(data_XVar,6))
Since this is built using restricted cubic spline with 6 knots I get 5 coefficients with one intercept.
Now I could not understand how to apply this model over new sets of coefficient values. Any example to perform this would be really helpful.Further, I am not sure whether we have specify any knot positions or the model saves the previous knot positions saved while building the model.
I have run a few models in for the penalized logistic model in R using the
logistf package. I however wish to plot some forest plots for the data.
The sjPlot package : http://www.strengejacke.de/sjPlot/custplot/
gives excellent function for the glm output, but no function for the logistf function.
Any assistance?
The logistf objects differ in their structure compared to glm objects, but not too much. I've added support for logistf-fitted models, however, 1) model summaries can't be printed and b) predicted probability plots are currently not supported with logistf-models.
I'll update the code on GitHub tonight, so you can try the updated sjp.glm function...
library(sjPlot)
library(logistf)
data(sex2)
fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2)
# for this example, axisLimits need to be specified manually
sjp.glm(fit, axisLimits = c(0.05, 25), transformTicks = T)
I want to have a classification table for logistic regression using lrm function in rms package and then plot the roc curve.I have perfomed this using glm function.Example code
train<-sample(dim(data)[1],.8*dim(data)[1]) #80-20 training/test
datatrain<-data[train,]
datatest<-data[-train,]
fit<-glm(Target ~ ., data=datatrain,family=binomial()) #Target is 0/1 variable
prob=predict(fit,type=c("response"),datatest)
datatest$prob=prob
library(pROC)
ROC <- roc(Target==1 ~ prob, data = datatest)
plot(ROC)
confusion<-table(prob>0.5,datatest$Target)
errorrate<-sum(diag(confusion))/sum(confusion)
errorrate
How to get the confusion matrix using lrm function?
The lrm function returns a fit object that inherits from the glm-class. That is not explicitly stated in the lrm help page, but it's easy enough to verify. After running the setup code in the first example on the ?lrm page
> f <- lrm(ch ~ age)
> class(f)
[1] "lrm" "rms" "glm"
So you should be able to use the ordinary predict method you were using above. Prof Harrell advises against using split-sample validation and the use of ROC curves for model comparison. He provides mechanisms for better methods in his package.
I am trying to learn R after using Stata and I must say that I love it. But now I am having some trouble. I am about to do some multiple regressions with Panel Data so I am using the plm package.
Now I want to have the same results with plm in R as when I use the lm function and Stata when I perform a heteroscedasticity robust and entity fixed regression.
Let's say that I have a panel dataset with the variables Y, ENTITY, TIME, V1.
I get the same standard errors in R with this code
lm.model<-lm(Y ~ V1 + factor(ENTITY), data=data)
coeftest(lm.model, vcov.=vcovHC(lm.model, type="HC1))
as when I perform this regression in Stata
xi: reg Y V1 i.ENTITY, robust
But when I perform this regression with the plm package I get other standard errors
plm.model<-plm(Y ~ V1 , index=C("ENTITY","YEAR"), model="within", effect="individual", data=data)
coeftest(plm.model, vcov.=vcovHC(plm.model, type="HC1))
Have I missed setting some options?
Does the plm model use some other kind of estimation and if so how?
Can I in some way have the same standard errors with plm as in Stata with , robust
By default the plm package does not use the exact same small-sample correction for panel data as Stata. However in version 1.5 of plm (on CRAN) you have an option that will emulate what Stata is doing.
plm.model<-plm(Y ~ V1 , index=C("ENTITY","YEAR"), model="within",
effect="individual", data=data)
coeftest(plm.model, vcov.=function(x) vcovHC(x, type="sss"))
This should yield the same clustered by group standard-errors as in Stata (but as mentioned in the comments, without a reproducible example and what results you expect it's harder to answer the question).
For more discussion on this and some benchmarks of R and Stata robust SEs see Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R.
See also:
Clustered standard errors in R using plm (with fixed effects)
Is it possible that your Stata code is different from what you are doing with plm?
plm's "within" option with "individual" effects means a model of the form:
yit = a + Xit*B + eit + ci
What plm does is to demean the coefficients so that ci drops from the equation.
yit_bar = Xit_bar*B + eit_bar
Such that the "bar" suffix means that each variable had its mean subtracted. The mean is calculated over time and that is why the effect is for the individual. You could also have a fixed time effect that would be common to all individuals in which case the effect would be through time as well (that is irrelevant in this case though).
I am not sure what the "xi" command does in STATA, but i think it expands an interaction right ? Then it seems to me that you are trying to use a dummy variable per ENTITY as was highlighted by #richardh.
For your Stata and plm codes to match you must be using the same model.
You have two options:(1) you xtset your data in stata and use the xtreg option with the fe modifier or (2) you use plm with the pooling option and one dummy per ENTITY.
Matching Stata to R:
xtset entity year
xtreg y v1, fe robust
Matching plm to Stata:
plm(Y ~ V1 + as.factor(ENTITY) , index=C("ENTITY","YEAR"), model="pooling", effect="individual", data=data)
Then use vcovHC with one of the modifiers. Make sure to check this paper that has a nice review of all the mechanics behind the "HC" options and the way they affect the variance covariance matrix.
Hope this helps.