AMOS-like diagrams in R - r

I am doing an SEM analysis in R using lavaan. My analysis is complete, but I am not satisfied with the diagrams of the measurement model and structural model provided by semPlot and lavaanPlot. In lavaanPlot, the path arrows are curly, giving a hand-drawn impression, in semPlot, the variables and paths are a bit cluttered.
Is it possible to mimic diagrams produced by AMOS in R (not necessarily the colours etc. But at least the overall structure)? I tried using semPlotModel_AMOS(model.fit) (here model.fit is the object I got by fitting my SEM model to data).
But it throws an error,
Error in file.info(object) : invalid filename argument
In addition: Warning message:
In semPlotModel_Amos(model3.fit) :
(Residual) variances of Amos model is not yet supported

You can use the Onyx package to build a lavaan (or OpenMx) model starting with the path diagram, similar to how Amos works. I think the WebSEM site still provides similar functionality if you want to fit your model using a web interface.
To automatically design a graphic from a fitted lavaan model, I am unaware of options other than the semPlot and lavaanPlot packages you already mentioned. But you could fairly easily build your own custom path diagram using the DiagrammeR package (see their video demonstration). Unfortunately, there must always be a trade-off between ease-of-use/automation and flexibility/control to get exactly the picture you want.

Related

Can no longer make predictions with older Rborist model

I have an older random forest model I built with the Rborist package, and I recently stopped being able to make predictions with it. I think older versions of the package produced models of class Rborist and starting with version 0.3-1 it makes models of class rfArb. So the predict method also changed from predict.Rborist to predict.rfArb.
I was originally getting a message about how there was "no applicable method for 'predict' applied to an object of class 'Rborist'" (since it had been depreciated, I think). Then I manually changed the class of my old model to rfArb and started getting a new error message:
Error in predict,rfArb(surv_mod_rf, newdata = trees) :
Sampler state needed for prediction
It looks to me like this has to do with the way rfArb objects are constructed (they have a sampler vector that shows how many times each observation in the sample was sampled, or some such), which is different than the way Rborist objects are constructed. But please let me know if I'm misunderstanding this.
Is there any way to update my old model so I can use the recent version of Rborist to make predictions, or are the only options using an older version of the package or rebuilding the model?

Factor Variable Importance (VIMP) with RandomForestSRC Package: cannot coerce to data.frame error

Good afternoon, all--thank you in advance for your help! I'm somewhat new to R, so my apologies if this is a trivial or otherwise inappropriate question.
TL;DR: I'm trying to determine Variable Importance (VIM) for factor variables with a random forest model built-in RandomForestSRC, which is not a built-in feature of that package. Using both the LIME and DALEX packages, I encounter the same error: cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame. Any assistance resolving this error, or alternate approaches, would be greatly appreciated!
I have a random forest model I've built in R, using the RandomForestSRC package. The model seems to work great--training and testing went fine, got the predicted output I needed, results seem in-line with what I would expect. Unfortunately, one of the requirements is that I need to be able to indicate how the model arrived at its conclusions (eg, I need to also include variable importance as a part of the output), for both continuous and factor variables.
This doesn't seem to be a built-in feature with the RandomForestSRC package, so I've looked into both the LIME and DALEX packages, both of which should be able to break out VIM from the existing RF model. Unfortunately, neither have native support for the RFSRC package, which means I've needed to build in the prediction functions myself, as recommended by this vignette:https://uc-r.github.io/dalex
model_type.rfsrc <- function (x, ...) {
return ('classification')
}
predict_model.rfsrc <- function (x, newdata, type, ...) {
as.data.frame(predict(x, newdata, ...)
}
Unfortunately, in running the VIM section of the model (in both LIME and DALEX), I'm asked to pass both the predicted output and the model that created that output. In doing so, it hits an error with the above predict_model function:
error in as.data.frame.default(predict(model, (newdata))):
cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame
And, like...of course, it can't; it's trying to turn the model itself into a data frame. Unfortunately, while I think I understand why R is giving me that error, that's about as far as I've been able to figure out on my own.
Additionally, I'm using the RandomForestSRC package for two reasons: it doesn't put a limit on the number of factor variables, and it can handle imbalanced data. I'm working with medical data, so both of these are necessary (eg, there are ~100,000 different medical codes that can be encoded in a single data variable, and the ratio of "people-who-don't-have-this-condition" vs "people-who-do-have-this-condition" is frequently 100 to 1). If anyone has any suggestions for alternative packages that handle these issues, though, and have built-in VIM functionality (or integrate with DALEX / LIME), that would be fantastic as well.
Thank you all very much for your help!

Set up different actiavtion functions for different layers using "neuralnet" package

Ciao,
I am working to neuralnet in R.
I used to program this kind of stuff using Keras in python so I would expect to be able to set up different activation functions for different layers.
Let me explain. Suppose I want to build a neural net with 2 hidden layers (say with 5 and 4 neurons) and an output between -1 and 1.
I would like to set up RELU or softplus in the hidden layers and tanh in the output layer.
The issue here is that neuralnet package lets me choose only one activation function via the argument act.fun:
> nn <- neuralnet(data = data, hidden = c(5, 4), act.fun =tanh)
I tried by setting the act.fun argument as c(softplus, softplus, tanh) but of course I get an error because the neuralnet function expects only one function for that argument.
Do you know how I can set up the neuralnet in this way? On the internet I can only find very basic linear neural net built with this package. If it would be not possible this mean that this package is almost useless because it would be able to build only "linear models" (??!)
Thanks a lot,
ciao
ReLu was added in neuralnet 1.44.4 (not on CRAN yet, could use devtools::install_github("bips-hb/neuralnet")). In this version it's also possible to change the output activation function separately (output.act.fct). However, different activations for the hidden layers is not yet possible.
See also here: https://github.com/bips-hb/neuralnet/issues/18.
On the internet I can only find very basic linear neural net built with this package. If it would be not possible this mean that this package is almost useless because it would be able to build only "linear models" (??!)
No, not only linear models. But note that the package is from the pre-deep learning era (2008) and not made for deep networks. I would also recommend keras (the R package is great) here.

Using a 'gbm' model created in R package 'dismo' with functions in R package 'gbm'

This is a follow-up to a previous question I asked a while back that was recently answered.
I have built several gbm models with dismo::gbm.step, which relies on the gbm fitting functions found in R package gbm, as well as cross validation tools from R package splines.
As part of my analysis, I would like to use some of the graphical tools available in R (e. g. perspective plots) to visualize pairwise interactions in the data. Both the gbm and the dismo packages have functions for detecting and modelling interactions in the data.
The implementation in dismo is explained in Elith et. al (2008) and returns a statistic which indicates departures of the model predictions from a linear combination of the predictors, while holding all other predictors at their means.
The implementation in gbm uses Friedman`s H statistic (Friedman & Popescue, 2005), and returns a different metric, and also does NOT set the other variables at their means.
The interactions modelled and plotted with dismo::gbm.interactions are great and have been very informative. However, I would also like to use gbm::interact.gbm, partly for publication strength and also to compare the results from the two methods.
If I try to run gbm::interact.gbm in a gbm.object created with dismo, an error is returned…
"Error in is.factor(data[, x$var.names[j]]) :
argument "data" is missing, with no default"
I understand dismo::gmb.step adds extra data the authors thought would be useful to the gbm model.
I also understand that the answer to my question lies somewherein the source code.
My questions is...
Is it possible to modify a gbm object created in dismo to be used in gbm::gbm.interact? If so, would this be accomplished by...
a. Modifying the gbm object created in dismo::gbm.step?
b. Modifying the source code for gbm::interact.gbm?
c. Doing something else?
I will be going through the source code trying to solve this myself, if I come up with a solution before anyone answers I will answer my own question.
The gbm::interact.gbm function requires data as an argument interact.gbm <- function(x, data, i.var = 1, n.trees = x$n.trees).
The dismo gbm.object is essentially the same as the gbm gbm.object, but with extra information attached so I don't imagine changing the gbm.object would help.

standard errors for loess in R

I am attempting to find a reference which explains how one computes standard errors for local polynomial regression? Specifically, in R one can use the loess function to get a model object and then use the predict function to retrieve standard errors. Is there a reference somewhere to what is actually happening? What about in the case when there may be serial correlation in the residuals, one must adjust this using Newey-West type methods, is there a way to use the sandwich package to do this as you would for a regular OLS using lm?
I tried looking at the source but the standard error computation calls a C function.
The "Source" section of ?loess tells you that the underlying C-code comes from the cloess package of Cleveland et al., and points you to its web home:
Source:
The 1998 version of ‘cloess’ package of Cleveland, Grosse and
Shyu. A later version is available as ‘dloess’ at http://www.netlib.org/a>.
Going there, you will find a link to a 50 page document (warning: postscript doc) that should tell you everything you need to know about this implementation of loess. In Cleveland's words:
This guide describes crucial steps in the proper analysis of data using
loess. Please read it.
Of particular interest will be the first couple pages of "Section 4: Statistical and Computational Methods".

Resources