Alternative example for capscale function in vegan package - r

I have been learning multivariate analyses in PRIMER, yet now want to convert to R using the vegan package. I wish to use the capscale() function in vegan, yet am not sure how my data should be formatted beforehand.
In the example in the vignette http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/capscale.html, both dataframes (varespec and varechem) have numeric values only, yet I have one dataframe of dependent (numeric) values, and another of independent (factor) values. So what I am asking for is an alternative worked example that I might be able to emulate. I can't find anything online.
The iris data set should provide sufficient toy data. Thank you

The vignette source you use is badly outdated: I got to remove it from the Web. The help page for ?capscale should contain more up-to-date documentation in your current vegan installation. For the independent data with factors, you should be able to use the model of any other constrained ordination help (?rda) which tell you that with formula interface you can have factors in the independent data -- and the formula interface is the only allowed interface in capscale.
You should switch from capscale to dbrda in vegan: capscale may be deprecated in the future.

Related

Is it possible to adjust the CochranArmitageTest() function i R for additional variables?

I've tried to search for the answer to this question but have not found any.
Does the CochranArmitageTest() function in R support adjustments for additional variables, apart from the two-level dependent and k-leveled independent variable? How in that case should the x (frequency table or matrix) be presented for the function?
Best regards
That particular test is specifically only written to assess a single ordinal or categorical variable and a binary outcome variable. It is not, as far as I know modifiable.
However, it is my understanding that the Bioconductor project, which curates a lot of pharmaceutical, biological and genetics packages in R, hosts a package developed about 5-years ago to work with multiple categorial or ordinal variables and binary outcome.
It is in the globaltest package which you can install with the following directy from the Bioconductor repository.
BiocManager::install("globaltest")
Here is the PDF explaining the whole package:

Factor Variable Importance (VIMP) with RandomForestSRC Package: cannot coerce to data.frame error

Good afternoon, all--thank you in advance for your help! I'm somewhat new to R, so my apologies if this is a trivial or otherwise inappropriate question.
TL;DR: I'm trying to determine Variable Importance (VIM) for factor variables with a random forest model built-in RandomForestSRC, which is not a built-in feature of that package. Using both the LIME and DALEX packages, I encounter the same error: cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame. Any assistance resolving this error, or alternate approaches, would be greatly appreciated!
I have a random forest model I've built in R, using the RandomForestSRC package. The model seems to work great--training and testing went fine, got the predicted output I needed, results seem in-line with what I would expect. Unfortunately, one of the requirements is that I need to be able to indicate how the model arrived at its conclusions (eg, I need to also include variable importance as a part of the output), for both continuous and factor variables.
This doesn't seem to be a built-in feature with the RandomForestSRC package, so I've looked into both the LIME and DALEX packages, both of which should be able to break out VIM from the existing RF model. Unfortunately, neither have native support for the RFSRC package, which means I've needed to build in the prediction functions myself, as recommended by this vignette:https://uc-r.github.io/dalex
model_type.rfsrc <- function (x, ...) {
return ('classification')
}
predict_model.rfsrc <- function (x, newdata, type, ...) {
as.data.frame(predict(x, newdata, ...)
}
Unfortunately, in running the VIM section of the model (in both LIME and DALEX), I'm asked to pass both the predicted output and the model that created that output. In doing so, it hits an error with the above predict_model function:
error in as.data.frame.default(predict(model, (newdata))):
cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame
And, like...of course, it can't; it's trying to turn the model itself into a data frame. Unfortunately, while I think I understand why R is giving me that error, that's about as far as I've been able to figure out on my own.
Additionally, I'm using the RandomForestSRC package for two reasons: it doesn't put a limit on the number of factor variables, and it can handle imbalanced data. I'm working with medical data, so both of these are necessary (eg, there are ~100,000 different medical codes that can be encoded in a single data variable, and the ratio of "people-who-don't-have-this-condition" vs "people-who-do-have-this-condition" is frequently 100 to 1). If anyone has any suggestions for alternative packages that handle these issues, though, and have built-in VIM functionality (or integrate with DALEX / LIME), that would be fantastic as well.
Thank you all very much for your help!

Determine number of factors in EFA (R) using Comparison Data

I am looking for ways to determine number of optimal factors in R factanal function. The most used method (conduct a pca and use scree plot to determine the number of factors) is already known to me. I have found a method described here to be easier for non technical folks like me. Unfortunately the R script is no longer accessible in which the method was implemented. I was wondering if there is a package available in R that does the same?
The method was originally proposed in this study: Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure.
The R code is now moved here as per the author.
EFA.dimensions ist also a nice and easy to use package for that

PLM package in R

This is a trivial question; but, I'm new to R; and, none of the tutorials I've seen address it. When using the PLM package in R for my panel data, do I include the cross sectional units, the individuals' variable, in my regression formula? While they don't speak to it directly, the tutorials that I've seen seem to leave that variable out. However, in practice, the results are far more realistic when left in.
The package assumes that the individual and time indexes are in the first two columns. If they are not, use the index argument.
Reference: plm paper (section 4.1: Data structure)

Analysis of complex survey design with multiple plausible values

I am working with several large databases (e.g. PISA and NAEP) that use a complex survey design with replicate weights and multiple plausible values. I can address the former using the survey package. However, does there exist an R package/function to analyze the latter?
For reference, I have found this article to provide a good overview of the issue: http://www.ierinstitute.org/fileadmin/Documents/IERI_Monograph/IERI_Monograph_Volume_02_Chapter_01.pdf
I'm not sure how the general idea of 'plausible values' differs from using multiple imputation to generate several sets of imputed values (such as the the Amelia package does). But Thomas Lumley's mitools package can be used to combine the various sets of imputed values, and it might be the case that it can be used to combine your sets of plausible values to obtain the 'correct' standard errors of the estimates.
Daniel Caro develop an R package for large scale assessments. You can find it here http://cran.r-project.org/web/packages/intsvy/index.html
This is code example using the regression command, over the plausible values on Mathemathics:
## Not run:
# Table I.2.3a, p. 305, International Report 2012
pisa.reg.pv(pvlabel="MATH", x="ST04Q01", by = "IDCNTRYL", data=pisa)
Although, I'm not sure if this package can be used to analyze NAEP data.
I hope this fulfill your purposes; at least partially.
As of survey version 3.36 there's withPV
data(pisamaths, package="mitools")
des<-svydesign(id=~SCHOOLID+STIDSTD, strata=~STRATUM, nest=TRUE,
weights=~W_FSCHWT+condwt, data=pisamaths)
options(survey.lonely.psu="remove")
results<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH),
data=des,
action=quote(svyglm(maths~ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS, design=des)))
summary(MIcombine(results))

Resources