I have computed several models, using both glm() and rxGlm() (the second one is from Microsoft R). Unfortunately, rxGlm() does not store all information required by stargazer. So when trying to create the summary table (even after adjusting the RxGlm data via as.glm() ), I get the following error message:
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
Rank zero or should not have used lm(.., qr=FALSE).
I am already reading out separetly the t-statistics and p-values and feed them back to stargazer separately. However, stargazer still requires the model output objects to be stored in the workspace and otherwise sends an error message.
This is how I extract the statistics from the model output:
obj1.t <- summary(obj1)$coef[ , "z value"]
obj1.p <- summary(obj1)$coef[ , "Pr(>|z|)"]
This is a simplified form of my stargazer command, where se = and p = are used to feed back the previously extracted statistics.
stargazer(list(obj1, obj2),
type = "html", table.layout = "cd=!t-s-!a=!n", star.cutoffs=c(0.05,0.01,0.001), no.space = TRUE,
omit = c(1989:2015), font.size = "normalsize",
out = "Test.html",
df= FALSE,
column.labels = c("(1)", "(2)"),
add.lines = list(c("fixed effects", "No", "Yes")),
dep.var.labels = c("Dummy"),
title = "GLM PROBIT MODEL",
se= list(obj1.t, obj2.t),
p = list(obj1.p, obj2.p),
notes = "t statistics shown in parentheses")
Now my question: is there a way to create regression output tables with stargazer without having to provide the model output objects? So basically store all the required data in separate vectors and then feed them back to stargazer? RxGlm summaries provide all the information that is necessary to fill the regression results table manually. However, I am looking for a way to do it automatically.
Related
When trying to graph the conditional fixed effects of a glmmTMB model with two random intercepts in GGally I get the error:
There was an error calling "tidy_fun()". Most likely, this is because the
function supplied in "tidy_fun=" was misspelled, does not exist, is not
compatible with your object, or was missing necessary arguments (e.g. "conf.level=" or "conf.int="). See error message below.
Error: Error in "stop_vctrs()":
! Can't recycle "..1" (size 3) to match "..2" (size 2).`
I have tinkered with figuring out the issue and it seems to be related to the two random intercepts included in the model. I have also tried extracting the coefficient and standard error information separately through broom.mixed::tidy and then feeding the data frame into GGally:ggcoef() with no avail. Any suggestions?
# Example with built-in randu data set
data(randu)
randu$A <- factor(rep(c(1,2), 200))
randu$B <- factor(rep(c(1,2,3,4), 100))
# Model
test <- glmmTMB(y ~ x + z + (0 +x|A) + (1|B), family="gaussian", data=randu)
# A few of my attempts at graphing--works fine when only one random effects term is in model
ggcoef_model(test)
ggcoef_model(test, tidy_fun = broom.mixed::tidy)
ggcoef_model(test, tidy_fun = broom.mixed::tidy, conf.int = T, intercept=F)
ggcoef_model(test, tidy_fun = broom.mixed::tidy(test, effects="fixed", component = "cond", conf.int = TRUE))
There are some (old!) bugs that have recently been fixed (here, here) that would make confidence interval reporting on RE parameters break for any model with multiple random terms (I think). I believe that if you are able to install updated versions of both glmmTMB and broom.mixed:
remotes::install_github("glmmTMB/glmmTMB/glmmTMB#ci_tweaks")
remotes::install_github("bbolker/broom.mixed")
then ggcoef_model(test) will work.
So I have two datasets, og.data and newdata.df. I have matched their features and I want to use a feature from og.data to train a model so I can identify cases of this class in newdata.df. I am using the randomForest package in R documentation for it is here: https://cran.r-project.org/web/packages/randomForest/randomForest.pdf
split <- sample.split(og.data$class_label, SplitRatio = 0.7)
training_set = subset(og.data$class_label, split == TRUE)
test_set = subset(og.data$class_label, split == FALSE)
rf.classifier.object = randomForest(x = training_set[-1],
y = training_set$Engramcell,
ntree = 500)
I then use the test set to calculate the AUC, visualize ROC, precision, recall etc etc.
I do that using prediction probability generated like so...
predictions.df <- as.data.frame(predict(rf.classifier.object,
test_set,
type = "prob")
)
All is good I proceed to try to use the classifier I've trained on new data and now I am encountering a problem because the new data does not contain the feature class label. Whihc is annoying as the entire purpose of training the classifier to to label this newdata.
predictions.df <- as.data.frame(predict(rf.classifier.object,
newdata.df,
type = "prob")
)
Please note the error has different variable names simply because I changed the code to make it more general for readability.
Error in predict.randomForest(rf.classifier.object, newdata.df, :
variables in the training data missing in newdata
As per this stack post predict.randomForest(), called here as predict(), uses rownames of feature importance to make its precitions. And when I checked with a search of the feature names I find that it is infact the class label preventing me from making the test as I show bellow.
# > rownames(rf.classifier.object$importance)[!(rownames(rf.classifier.object$importance) %in% colnames(newdata) )]
# [1] "class_label"
It is not clear to me what I should change in my script so that the classifier can be used on other data than the testing set. I have followed the instructions exactly this seems like a bad design choice to have made the function this way. The class label should not be used for calculating feature importance at all and should not even be considered a feature imo.
I looked at many posts on similar issue but didn't find a solution for my case.
I have balanced panel data for the CBSAs in the US from 2005-2015, when I ran fixed effects model with the following command, I get reasonable results and the panel is shown to be balanced.
plm(formula = fm, data = pCBSA_balanced, effect = "twoway", model = "within")
Balanced Panel: n = 938, T = 11, N = 10318
(actual results omitted...)
However, when I proceed to run the same model with spatial lag using the following command and its variations,
spml(formula = fm, data = pCBSA_balanced, listw = newCBSA_nb.lw, model = "within", effect="twoways", lag=TRUE,zero.policy=T,na.action = na.pass))
I get the following error...
Error in if (!balanced) stop("Estimation method unavailable for unbalanced panels") : missing value where TRUE/FALSE needed
I made sure there are no missing values in my panel data, ordered the panel to have id variable CBSAFP and year as my first and second variables, then sorted by year as suggested by other posts in similar situations, but the error persisted :(
I suspect the error might has something to do with 42 non-neighboring CBSAs in the weighting matrix. This is how I defined my listw, with the zero.policy option.
newCBSA_nb <- poly2nb(CBSA4p, queen = TRUE, row.names = CBSA4p$CBSAFP)
newCBSA_nb.lw <- nb2listw(newCBSA_nb, style="W", zero.policy=T)
I'm not willing to just get rid of the non-neighboring CBSAs for my analysis, does anyone have any suggestions on what I should do next?
I've used stargazer in the past with regression tables.
However I'd like to know how to use stargazer with output from factor analysis and principal component analysis.
My code runs as follows:
fa1 <- factanal(new2, factors = 4, rotation = "varimax", sort = TRUE)
print(fa1, digits = 3, cutoff = .5, sort = TRUE)
load <- fa1$loadings[,1:2]
plot(load,type="n")
text(load,labels=names(new2),cex=.7)
AND
two <- pca(new2, nfactors = 3)
THIS doesn't work - my only attempt so far.
stargazer(fa1, type = "text", title="Descriptive statistics", digits=1, out="table1.txt")
UPDATE: Since posting I have been able to convert the object to a data frame with:
converted <- as.data.frame(unclass(fa1$loadings))
I then used the code above successfully EXCEPT that the output doesn't seem to include individual factor scores.
See below:
loadings
this might not be a perfect sulution and you might have found out by now, but what you can do is to output the factor scores separately with stagazer by adding summary = FALSE as an option. This way you get only the factor loadings as an output.
For example like this:
stargazer(fa1, summary = FALSE, title="Descriptive statistics", digits=1, out="table1.txt")
I have calculated a tobit type 2 model with the selection()-function of the sampleSelection package in R.
I now want to create a regression table with stargazer that officially supports the sampleSelection package and its 'selection' objects.
stargazer(tobit, tobit_2, type = "html", out = "tobit2.html", model.names = TRUE,
star.char = c("+", "*", "**", "***"), star.cutoffs = c(0.1, 0.05, 0.01, 0.001),
report = 'vc*p', notes = "+ p<0.1; * p<0.05; ** p<0.01; *** p<0.001",
notes.append = F, selection.equation = TRUE)
According to the official documentation it is, however, only possible to report either the selection or the outcome equation. I obviously want to report both next to each other.
selection.equation
a logical value that indicates whether the selection equation (when argument is
set to TRUE) or the outcome equation (default) will be reported for heckit and
selection models from the package sampleSelection
Has anyone encoutered that issue before and has a solution how to report both conveniently in one table?
Thanks a lot!
I found a (rather dirty) solution which may though still help others:
I set selection.equation to TRUE, duplicate the selection-object and switch reference indices for the selection and outcome equation in the duplicated object. Calling stargazer now on both models gives a table with the selection and the outcome equation (although stargazer still thinks it returned both times the selection equation)
# tobit_2 is a selection-object returned from the selection() function
# from the sampleSelection package
tobit_2O <- tobit_2
tobit_2O$param$index$betaO <- tobit_2$param$index$betaS
tobit_2O$param$index$betaS <- tobit_2$param$index$betaO
stargazer(tobit_2, tobit_2O, selection.equation = TRUE,
column.labels = c("<em>selection</em>", "<em>outcome</em>"))