I am looking for ways to determine number of optimal factors in R factanal function. The most used method (conduct a pca and use scree plot to determine the number of factors) is already known to me. I have found a method described here to be easier for non technical folks like me. Unfortunately the R script is no longer accessible in which the method was implemented. I was wondering if there is a package available in R that does the same?
The method was originally proposed in this study: Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure.
The R code is now moved here as per the author.
EFA.dimensions ist also a nice and easy to use package for that
Related
I've tried to search for the answer to this question but have not found any.
Does the CochranArmitageTest() function in R support adjustments for additional variables, apart from the two-level dependent and k-leveled independent variable? How in that case should the x (frequency table or matrix) be presented for the function?
Best regards
That particular test is specifically only written to assess a single ordinal or categorical variable and a binary outcome variable. It is not, as far as I know modifiable.
However, it is my understanding that the Bioconductor project, which curates a lot of pharmaceutical, biological and genetics packages in R, hosts a package developed about 5-years ago to work with multiple categorial or ordinal variables and binary outcome.
It is in the globaltest package which you can install with the following directy from the Bioconductor repository.
BiocManager::install("globaltest")
Here is the PDF explaining the whole package:
I am using the Package ‘arules’ to mine frequent itemsets in my big data, but I cannot find suitable methods for discretization.
As the example in Package ‘arules’, several basic unsupervised methods can be used in the function ‘discretization’, but I want to estimate optimal number of categories in my large dataset, it seems more reasonable than assigning the number of categories.
Can you give me good advices for this, thanks.
#Michael Hahsler
I think there is little guidance on this for unsupervised discretization. Look at the histogram for each variable and decide manually. For k-means you could potentially use strategies to find k using internal validation techniques (i.e., elbow method). For supervised discretization there exist methods that will help you decide. Maybe someone else can help here.
I have been learning multivariate analyses in PRIMER, yet now want to convert to R using the vegan package. I wish to use the capscale() function in vegan, yet am not sure how my data should be formatted beforehand.
In the example in the vignette http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/capscale.html, both dataframes (varespec and varechem) have numeric values only, yet I have one dataframe of dependent (numeric) values, and another of independent (factor) values. So what I am asking for is an alternative worked example that I might be able to emulate. I can't find anything online.
The iris data set should provide sufficient toy data. Thank you
The vignette source you use is badly outdated: I got to remove it from the Web. The help page for ?capscale should contain more up-to-date documentation in your current vegan installation. For the independent data with factors, you should be able to use the model of any other constrained ordination help (?rda) which tell you that with formula interface you can have factors in the independent data -- and the formula interface is the only allowed interface in capscale.
You should switch from capscale to dbrda in vegan: capscale may be deprecated in the future.
This is a trivial question; but, I'm new to R; and, none of the tutorials I've seen address it. When using the PLM package in R for my panel data, do I include the cross sectional units, the individuals' variable, in my regression formula? While they don't speak to it directly, the tutorials that I've seen seem to leave that variable out. However, in practice, the results are far more realistic when left in.
The package assumes that the individual and time indexes are in the first two columns. If they are not, use the index argument.
Reference: plm paper (section 4.1: Data structure)
I am working with several large databases (e.g. PISA and NAEP) that use a complex survey design with replicate weights and multiple plausible values. I can address the former using the survey package. However, does there exist an R package/function to analyze the latter?
For reference, I have found this article to provide a good overview of the issue: http://www.ierinstitute.org/fileadmin/Documents/IERI_Monograph/IERI_Monograph_Volume_02_Chapter_01.pdf
I'm not sure how the general idea of 'plausible values' differs from using multiple imputation to generate several sets of imputed values (such as the the Amelia package does). But Thomas Lumley's mitools package can be used to combine the various sets of imputed values, and it might be the case that it can be used to combine your sets of plausible values to obtain the 'correct' standard errors of the estimates.
Daniel Caro develop an R package for large scale assessments. You can find it here http://cran.r-project.org/web/packages/intsvy/index.html
This is code example using the regression command, over the plausible values on Mathemathics:
## Not run:
# Table I.2.3a, p. 305, International Report 2012
pisa.reg.pv(pvlabel="MATH", x="ST04Q01", by = "IDCNTRYL", data=pisa)
Although, I'm not sure if this package can be used to analyze NAEP data.
I hope this fulfill your purposes; at least partially.
As of survey version 3.36 there's withPV
data(pisamaths, package="mitools")
des<-svydesign(id=~SCHOOLID+STIDSTD, strata=~STRATUM, nest=TRUE,
weights=~W_FSCHWT+condwt, data=pisamaths)
options(survey.lonely.psu="remove")
results<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH),
data=des,
action=quote(svyglm(maths~ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS, design=des)))
summary(MIcombine(results))