How to create response surface using random forest model in R? - r

I have a made a rf model in R having six predictors and a response. The predictive model seems to be good enough but we also wanted to generate a response surface for this model.
attach(al_mf)
library(randomForest)
set.seed(1)
rfalloy=randomForest(Mf~.,data=al_mf,mtry=6,importance=TRUE)
rfalloy
rfpred=predict(rfalloy,al_mf$Mf)
rfpred
sse=sum((rfpred-mean(al_mf$Mf))^2)
sse
ssr=sum((rfpred-al_mf$Mf)^2)
ssr
Rsqaure=1-(ssr/(sse+ssr))
Rsqaure
importance(rfalloy)

At a general level, since you haven't provided too many specifics about exactly what you are looking for in your response surface, here are a few hopefully helpful starting points:
Have you taken a look at rsm? This documentation provides some good use cases for the package.
These in-class notes from a University of New Mexico stats lecture are full of code examples related to response surfaces. Just check out the table of contents and you'll probably find what you're looking for.
This StackOverflow post also provides an example using the rgl package.

Related

Difference between "mlp" and "mlpML"

I'm using the Caret package from R to create prediction models for maximum energy demand. What i need to use is neural network multilayer perceptron, but in the Caret package i found out there's 2 of the mlp method, which is "mlp" and "mlpML". what is the difference between the two?
I have read description from a book (Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization) but it still doesnt answer my question.
Caret has 238 different models available! However many of them are just different methods to call the same basic algorithm.
Besides mlp there are 9 other methods of calling a multi-layer-perceptron one of which is mlpML. The real difference is only in the parameters of the function call and which model you need depends on your use case and what you want to adapt about the basic model.
Chances are, if you don't know what mlpML or mlpWeightDecay,etc. does you are fine to just use the basic mlp.
Looking at the official documentation we can see that:
mlp(size) while mlpML(layer1,layer2,layer3) so in the first method you can only tune the size of the multi-layer-perceptron while in the second call you can tune each layer individually.
Looking at the source code here:
https://github.com/topepo/caret/blob/master/models/files/mlp.R
and here:
https://github.com/topepo/caret/blob/master/models/files/mlpML.R
It seems that the difference is that mlpML allows several hidden layers:
modelInfo <- list(label = "Multi-Layer Perceptron, with multiple layers",
while mlp has one single layer with hidden units.
The official documentation also hints at this difference. In my opinion, it is not particularly useful to have many different models that differ only very slightly, and the documentation does not explain those slight differences well.

Fable functions - theoretical questions

My master thesis is in health forecasting and I'm using R (fable, fabletools, fasster) to implement the methods.
For the theoretical part of the thesis, I need to know the heuristics and the theoretical basis of each function I use.
I have been using Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos and I have already read R documentation on these functions but I still have some doubts.
I need information like what theoretical method they follow (ARIMA, Moving Averages, ANN, etc), the mathematical expression they use and how it is decided which is the best fit (for automatic methods):
I use the following methods and gathered some information about each one.
I'm new in this field and I need some help.
Is this correct? Can anyone add anything else about any of the functions?
ARIMA() - MSARIMA model (meaning an ARIMA model that is sensible to seasonality and can take into account several external regressors:
SNAIVE()- Linear regression with seasonality;
NNETAR() - ANN model;
fasster()
ETS()
Thank you in advance!
The book you cite contains information on how SNAIVE, NNETAR, ETS, and ARIMA forecasts are calculated. It explains that for model classes such as ETS and ARIMA, the AICc is used to select a particular model. It gives equations for all these methods. Please read it.
fasster() is a new method that is not fully documented yet. The readme file (https://github.com/tidyverts/fasster) provides some information, and there is a talk by the author (https://www.youtube.com/watch?v=6YlboftSalY) explaining the state space modelling framework behind it.

Demonstration Code for Nested Dirichlet Process

My question is about how to implement the nested Dirichlet process (NDP) with R code.
The NDP is suitable for clustering over distributions and simultaneously clustering within a distribution. Rodriguez et al. (2008) provided a simulation example to demontrate the ability of the NDP to distinguish different distributions. I am trying to learn this approach by reproducing the results for this example. But failed to do so because I cannot understand well how the base distribution is related to the mixture components.
The simulation example used a normal inverse-gamma distributioin, NIG(0,0.01,3,1), as the base distribution. But the four different distributions are:
The algorithm provided in Section 4 (Rodriguez et al.,2008, p.1135) was used to do the simulation. I have problem to understand and execute this algorithm, especially step 5:
Can you please provide a sample code to demonstrate this algorithm? Your help is highly appreciated!
I have not be able to do the coding by myself but I have found a recent paper which does the simulation using exact inference instead of truncation approximation. I think it might help someone else who has interest just like me, so I post the link to that paper here.
enter link description here
The good thing I like about this paper is that it is well written and has source code (in R) for me to understand this methodology better.

Bioassay dose response fitting with heteroscedastic data

I am using the drc package in R to fit dose response curves (4-param logistic: LL.4) for biological assays. The data I collect is typically heteroscedastic (example image below). I am looking for ways to account for this when calling drm. I have found three possibilities that seem promising:
Use the type="Poisson" parameter to drm. However, over- and under-dispersion are probable for many assays so this isn't likely to be a general solution
Follow drm with a call to drc.boxcox. This seems to be more general and could work.
Use the "varPower" tranform that used to be implemented in drc.multdrc and in drc.drm before it was commented out (search for "varPower" in the drm source). I could un-comment those sections to restore the varPower functionality.
My questions are, what is the most accepted way to handle this? Also, does anyone know why varPower variance handling was removed from the drc package?
Example code:
# Naive method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params)
#Poisson Method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params, type="Poisson")
#BOXCOX method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params)
a2 <- boxcox(a)
Example Data:
I found the answer to this question in this paper by the authors of the drc package. In the paper they comment:
Weights may be used for addressing variance heterogeneity in the
response. However, the transform-both-sides approach should be
preferred over using often very imprecisely determined weights
The "transform-both-sides" approach refers to using the drc.boxcox function (code in the original question).
Further advice was provided in a personal communication with one of the authors of the drc package. He advised that presently, the medrc R package is better suited for dose response analysis in R.

Setting Contrasts for ANOVA in R

I've been attempting to perform an ANOVA in R recently on the attached data frame.
My question revolves around the setting of contrasts.
My design is a 3x5 within-subjects design.
There are 3 visual conditions under 'Circle1' and 5 audio under 'Beep1'.
Does anyone have any idea how I should set the contrasts? This is something I'm unfamiliar with as I'm making the transition from point and click stats in SPSS to coded in R.
Thanks for your time
Data file:
Reiterating my answer from another stackoverflow question that was flagged as similar, since you didn't provide any code, you might start by having a look at the contrast package in R. As they note in the document:
"The purpose of the contrast package is to provide a standardized interface for testing linear combinations of parameters from common regression models. The syntax mimics the contrast. Design function from the Design library. The contrast class has been extended in this package to linear models produced using the functions lm, glm, gls, lme and geese."
There is also a nice little tutorial here by Dr. William King who talks about factorial between subjects ANOVA and also includes an abundance of R code. This is wider scoped than you question but would be a great place to start (just to get context).
Finally, here is another resource that you can refer to which talks about setting up orthogonal contrasts in R.

Resources