How to use Stargazer R package with factor analysis output - r

I've used stargazer in the past with regression tables.
However I'd like to know how to use stargazer with output from factor analysis and principal component analysis.
My code runs as follows:
fa1 <- factanal(new2, factors = 4, rotation = "varimax", sort = TRUE)
print(fa1, digits = 3, cutoff = .5, sort = TRUE)
load <- fa1$loadings[,1:2]
plot(load,type="n")
text(load,labels=names(new2),cex=.7)
AND
two <- pca(new2, nfactors = 3)
THIS doesn't work - my only attempt so far.
stargazer(fa1, type = "text", title="Descriptive statistics", digits=1, out="table1.txt")
UPDATE: Since posting I have been able to convert the object to a data frame with:
converted <- as.data.frame(unclass(fa1$loadings))
I then used the code above successfully EXCEPT that the output doesn't seem to include individual factor scores.
See below:
loadings

this might not be a perfect sulution and you might have found out by now, but what you can do is to output the factor scores separately with stagazer by adding summary = FALSE as an option. This way you get only the factor loadings as an output.
For example like this:
stargazer(fa1, summary = FALSE, title="Descriptive statistics", digits=1, out="table1.txt")

Related

Error with svyglm function in survey package in R: "all variables must be in design=argument"

New to stackoverflow. I'm working on a project with NHIS data, but I cannot get the svyglm function to work even for a simple, unadjusted logistic regression with a binary predictor and binary outcome variable (ultimately I'd like to use multiple categorical predictors, but one step at a time).
El_under_glm<-svyglm(ElUnder~SO2, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)
Error in eval(extras, data, env) :
object '.survey.prob.weights' not found
I changed the variables to 0 and 1 instead:
Under_narm$SO2REG<-ifelse(Under_narm$SO2=="Heterosexual", 0, 1)
Under_narm$ElUnderREG<-ifelse(Under_narm$ElUnder=="No", 0, 1)
But then get a different issue:
El_under_glm<-svyglm(ElUnderREG~SO2REG, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)
Error in svyglm.survey.design(ElUnderREG ~ SO2REG, design = SAMPdesign, :
all variables must be in design= argument
This is the design I'm using to account for the weights -- I'm pretty sure it's correct:
SAMPdesign=svydesign(data=Under_narm, id= ~NHISPID, weight= ~SAMPWEIGHT)
Any and all assistance appreciated! I've got a good grasp of stats but am a slow coder. Let me know if I can provide any other information.
Using some make-believe sample data I was able to get your model to run by setting rescale = TRUE. The documentation states
Rescaling of weights, to improve numerical stability. The default
rescales weights to sum to the sample size. Use FALSE to not rescale
weights.
So, one solution maybe is just to set rescale = TRUE.
library(survey)
# sample data
Under_narm <- data.frame(SO2 = factor(rep(1:2, 1000)),
ElUnder = sample(0:1, 1000, replace = TRUE),
NHISPID = paste0("id", 1:1000),
SAMPWEIGHT = sample(c(0.5, 2), 1000, replace = TRUE))
# with 'rescale' = TRUE
SAMPdesign=svydesign(ids = ~NHISPID,
data=Under_narm,
weights = ~SAMPWEIGHT)
El_under_glm<-svyglm(formula = ElUnder~SO2,
design=SAMPdesign,
family=quasibinomial(), # this family avoids warnings
rescale=TRUE) # Weights rescaled to the sum of the sample size.
summary(El_under_glm, correlation = TRUE) # use correlation with summary()
Otherwise, looking code for this function's method with 'survey:::svyglm.survey.design', it seems like there may be a bug. I could be wrong, but by my read when 'rescale' is FALSE, .survey.prob.weights does not appear to get assigned a value.
if (is.null(g$weights))
g$weights <- quote(.survey.prob.weights)
else g$weights <- bquote(.survey.prob.weights * .(g$weights)) # bug?
g$data <- quote(data)
g[[1]] <- quote(glm)
if (rescale)
data$.survey.prob.weights <- (1/design$prob)/mean(1/design$prob)
There may be a work around if you assign a vector of numeric values to .survey.prob.weights in the global environment. No idea what these values should be, but your error goes away if you do something like the following. (.survey.prob.weights needs to be double the length of the data.)
SAMPdesign=svydesign(ids = ~NHISPID,
data=Under_narm,
weights = ~SAMPWEIGHT)
.survey.prob.weights <- rep(1, 2000)
El_under_glm<-svyglm(formula = ElUnder~SO2,
design=SAMPdesign,
family=quasibinomial(),
rescale=FALSE)
summary(El_under_glm, correlation = TRUE)

Why when I run the ggttest there is an error?

When I run the t-test for a numeric and a dichotomous variable there in no problem and I can see the results. The problem is when I run the ggttest of the same t-test. There is an error and says that one of my variable is not found. I do not why that happens. The aml dataset I used is from package boot. Below you can see the code:
https://i.stack.imgur.com/7kuaA.png
library(gginference)
time_group.test16537 = t.test(formula = time~group,
data = aml,
alternative = "two.sided",
paired = FALSE,
var.equal = FALSE,
conf.level = 0.95)
time_group.test16537
ggttest(time_group.test16537,
colaccept="lightsteelblue1",
colreject="gray84",
colstat="navyblue")
The problem comes with these lines of code in ggttest:
datnames <- strsplit(t$data.name, splitter)
len1 <- length(eval(parse(text = datnames[[1]][1])))
len2 <- length(eval(parse(text = datnames[[1]][2])))
It tries to find the len of group and time, but it doesn't see that it came from a data.frame. Pretty bad bug...
For your situation, supposedly you have less than 30 in each group and it plots a t-distribution, so do:
library(gginference)
library(boot)
gginference:::normt(t.test(time~group,data=aml),
colaccept = "lightsteelblue1",colreject = "grey84",
colstat = "navyblue")
t.test doesn't store your data in the output so there is no way that you could extract the data from the list of the output of t.test.
The only way to use formula is:
library(gginference)
t_test <- t.test(questionnaire$pulse ~ questionnaire$gender)
ggttest(t_test)
Original answer here: How to extract the dataset from an "htest" object when using formula in r

Apply PCA for data stored in list

My image data is stored in a list. For every pixel (626257) of my image I have a vector containing all the values corresponding to the different wavelengths (44 wavelengths). Now I would like to carry out a principal component analysis (PCA). Unfortunately, I am not able to convert my listed data into the desired form. Here is the code to generate a dummy data set.
test = replicate(626257, rnorm(44, 3, 1),simplify = FALSE)
When I now try to carry out the PCA then the following error message pops up.
pca = prcomp(test, scale = F)
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
How can I convert my list into a suitable datatype?
We could change the simplify = TRUE in replicate and it should work
test <- replicate(10, rnorm(44, 3, 1),simplify = FALSE)
pca = prcomp(test, scale = FALSE)

"Input datasets must be dataframes" error in kamila package in R

I have a mixed type data set, one continuous variable, and eight categorical variables, so I wanted to try kamila clustering. It gives me an error when I use one continuous variable, but when I use two continuous variables it is working.
library(kamila)
data <- read.csv("mixed.csv",header=FALSE,sep=";")
conInd <- 9
conVars <- data[,conInd]
conVars <- data.frame(scale(conVars))
catVarsFac <- data[,c(1,2,3,4,5,6,7,8)]
catVarsFac[] <- lapply(catVarsFac, factor)
kamRes <- kamila(conVars, catVarsFac, numClust=5, numInit=10,calcNumClust = "ps",numPredStrCvRun = 10, predStrThresh = 0.5)
Error in kamila(conVar = conVar[testInd, ], catFactor =
catFactor[testInd, : Input datasets must be dataframes
I think the problem is that the function assumes that you have at least two of both data types (i.e. >= 2 continuous variables, and >= 2 categorical variables). It looks like you supplied a single column index (conInd = 9, just column 9), so you have only one continuous variable in your data. Try adding another continuous variable to your continuous data.
I had the same problem (with categoricals) and this approach fixed it for me.
I think the ultimate source of the error in the program is at around line 170 of the source code. Here's the relevant snippet...
numObs <- nrow(conVar)
numInTest <- floor(numObs/2)
for (cvRun in 1:numPredStrCvRun) {
for (ithNcInd in 1:length(numClust)) {
testInd <- sample(numObs, size = numInTest, replace = FALSE)
testClust <- kamila(conVar = conVar[testInd,],
catFactor = catFactor[testInd, ],
numClust = numClust[ithNcInd],
numInit = numInit, conWeights = conWeights,
catWeights = catWeights, maxIter = maxIter,
conInitMethod = conInitMethod, catBw = catBw,
verbose = FALSE)
When the code partitions your data into a training set, it's selecting rows from a one-column data.frame, but that returns a vector by default in that case. So you end up with "not a data.frame" even though you did supply a data.frame. That's where the error comes from.
If you can't dig up another variable to add to your data, you could edit the code such that the calls to kamila in the cvRun for loop wrap the data.frame() function around any subsetted conVar or catFactor, e.g.
testClust <- kamila(conVar = data.frame(conVar[testInd,]),
catFactor = data.frame(catFactor[testInd,], ... )
and just save that as your own version of the function called say, my_kamila, which you could use instead.
Hope this helps.

stargazer: create table with regression output without model objects

I have computed several models, using both glm() and rxGlm() (the second one is from Microsoft R). Unfortunately, rxGlm() does not store all information required by stargazer. So when trying to create the summary table (even after adjusting the RxGlm data via as.glm() ), I get the following error message:
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
Rank zero or should not have used lm(.., qr=FALSE).
I am already reading out separetly the t-statistics and p-values and feed them back to stargazer separately. However, stargazer still requires the model output objects to be stored in the workspace and otherwise sends an error message.
This is how I extract the statistics from the model output:
obj1.t <- summary(obj1)$coef[ , "z value"]
obj1.p <- summary(obj1)$coef[ , "Pr(>|z|)"]
This is a simplified form of my stargazer command, where se = and p = are used to feed back the previously extracted statistics.
stargazer(list(obj1, obj2),
type = "html", table.layout = "cd=!t-s-!a=!n", star.cutoffs=c(0.05,0.01,0.001), no.space = TRUE,
omit = c(1989:2015), font.size = "normalsize",
out = "Test.html",
df= FALSE,
column.labels = c("(1)", "(2)"),
add.lines = list(c("fixed effects", "No", "Yes")),
dep.var.labels = c("Dummy"),
title = "GLM PROBIT MODEL",
se= list(obj1.t, obj2.t),
p = list(obj1.p, obj2.p),
notes = "t statistics shown in parentheses")
Now my question: is there a way to create regression output tables with stargazer without having to provide the model output objects? So basically store all the required data in separate vectors and then feed them back to stargazer? RxGlm summaries provide all the information that is necessary to fill the regression results table manually. However, I am looking for a way to do it automatically.

Resources