Is there a package that can calculate the Variance Inflation Factor (VIF) in Julia, similar to VIF from the fmsb package in R? If there isn't, how would I do it manually (I'm still quite confused by all the Julia Statistics packages and what assumptions they make)?
I couldn't find a package or function to do it but I figured out a way to do it manually:
using RDatasets, DataFrames, CSV, GLM
airquality = rename(dataset("datasets", "airquality"), "Solar.R" => "Solar_R")
model = lm(#formula(Wind ~ Temp+ Solar_R), airquality)
print(1/(1-r2(model)))
This returns 1.267492 which is the same as VIF(lm(Wind ~ Temp+Solar.R, data=airquality)) in R
Related
I have unbalanced panel data and I want to fit this type of regression:
Pr(y=1|xB) = G(xB+a)
where "y" is a binary variable, "x" vector of explanatory variables and "B" my coeff.
I want to implement random effect model with maximum likelihood estimation, however I didn't understand what I need to change in the plm function (of package plm) CRAN guide (vignette). As far I used this code:
library(plm)
p_finale <- plm.data(p_finale, index=c("idnumber","Year"))
attach(p_finale)
y <- (TotalDebt_dummy)
X_tot <- cbind(Size,ln_Age,liquidity,Asset_Tangibility,profitability,growth, sd_cf_risk1, family_dummy,family_manager,
sd_cf_risk1*family_dummy,
Ateco_A,Ateco_C,Ateco_D,Ateco_E,Ateco_F,Ateco_G,Ateco_H,Ateco_I,Ateco_J,Ateco_M,Ateco_N,
Ateco_Q,Ateco_R)
model1 <- plm(y~X_tot+factor(Year),data = p_finale, model="random")
I included the whole code, but the only thing I believe needs to be changed is the last row in plm.
Function plm from package plm does not use a maximum-likelihood approach for model estimation. It uses a GLS approach as is common in econometrics.
Please see the section about plm versus nlme and lme4 in the package's first vignette ("Panel data econometrics with R: the plm package" (https://cran.rstudio.com/web/packages/plm/vignettes/A_plmPackage.html). The section explains the differences between the appraoches and has code examples for boths (and refers to packages nlme and lme4 for the maximum-likelihood approach).
My data frame looks like something as follows:
unique.groups<- letters[1:5]
unique_timez<- 1:20
groups<- rep(unique.groups, each=20)
my.times<-rep(unique_timez, 5)
play.data<- data.frame(groups, my.times, y= rnorm(100), x=rnorm(100), POP= 1:100)
I would like to run the following weighted regression:
plm(y~x + factor(my.times) ,
data=play.data,
index=c('groups','my.times'), model='within', weights= POP)
But I do not believe the plm package allows for weights. The answer I'm looking for the coefficient from the model below:
fit.regular<- lm(y~x + factor(my.times) + factor(my.groups),
weights= POP, data= play.data)
desired.answer<- coefficients(fit.regular)
However, I am looking for an answer with the plm package because it is much faster to get the coefficient of the within estimator with plm with larger datasets and many groups.
Edit: This problem does not exist anymore since plm features a weight function now (see #Helix123 comment above).
Even though I know of no solution with the plm package, the felmfunction in the lfe package handles weights correctly in the context of fixed effects (which seems what you need from the syntax of your example code). It is particularly written with a focus on speed in the presence of many observations and groups.
The lfe package focuses on fixed effects only, so if you need random effects the lme4 package might be more suited to your needs.
I am looking for exactly this information. I found this answer http://r.789695.n4.nabble.com/Longitudinal-Weights-in-PLM-package-td3298823.html by one of the author of the packages, which seems to suggest there is no way of using weights directly within the plm package.
I have run a few models in for the penalized logistic model in R using the
logistf package. I however wish to plot some forest plots for the data.
The sjPlot package : http://www.strengejacke.de/sjPlot/custplot/
gives excellent function for the glm output, but no function for the logistf function.
Any assistance?
The logistf objects differ in their structure compared to glm objects, but not too much. I've added support for logistf-fitted models, however, 1) model summaries can't be printed and b) predicted probability plots are currently not supported with logistf-models.
I'll update the code on GitHub tonight, so you can try the updated sjp.glm function...
library(sjPlot)
library(logistf)
data(sex2)
fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2)
# for this example, axisLimits need to be specified manually
sjp.glm(fit, axisLimits = c(0.05, 25), transformTicks = T)
I am trying to apply RDA on my data in R, after some research I found that there is a package in R called "rda" which seems can do the job for me. However I looked at the description of the RDA function in that package and I'm a little confused now:
Usage given in R:
rda(x, y, xnew=NULL, ynew=NULL, prior=table(y)/length(y),alpha=seq(0, 0.99, len=10), delta=seq(0, 3, len=10), regularization="S", genelist=FALSE, trace=FALSE)
I'm not sure what do "alpha" and "delta" stand for in this case. I was taught that in RDA, there are two parameters "lambda" and "sigma", where lambda is a complexity parameter that
dictates the balance between linear and quadratic discriminant analysis and sigma is another parameter to regularise the covariance matrix further. BOTH OF THEM ARE BETWEEN 0 AND 1.
But as for this "rda" function in R, the default values of delta is between 0 and 3 which confused me.
Could anyone explain this for me please? Thanks!
You can use the package klaR which have a function rda with a parametrization of regularization parameters similar to the one you described.
detach(package:rda)
require(klaR)
data(iris)
x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2)
predict(x, iris)
Is not a good idea to mix the two packages (namespace issue for some functions), it's better to detach rda if you want to use klaR (or the opposite).
I apologize in advance if this question is too esoteric. I am using the Zelig package in R with a log-log regression model:
z.out <- zelig(lnDONATIONS ~ lnPRICE + lnFUNDRAISING + lnAGE, model = "ls", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
summary(s.out)
plot(s.out)
This works fine, but I am trying to implement something that is allowed in the Stata-based 'precursor' to Zelig (clarify); specifically, in the clarify package, after the 'setx' command, you can type in simqi, tfunc(exp) in order to get the expected values based on the exponential transformation of the dependent variable (the simqi command in Stata is analogous to the sim comamnd in R/Zelig). My question is, can this post-setx exponential transformation be done in R with the Zelig package, and if so, how? The very extensive Zelig documentation does not seem to have an analogue to the 'tfunc' command in the clarify package.
Thanks in advance for any insights.