Output posterior distribution from bayesian network in R (bnlearn) - r

I'm experimenting with Bayesian networks in R and have built some networks using the bnlearn package. I can use them to make predictions for new observations with predict(), however I would also like to have the posterior distribution over the possible classes. Is there a way of retrieving this information?
It seems like there is a prob-parameter that does this for the naive bayes implementation in the bnlearn package, but not for networks fitted with bn.fit.
Thankful for any help with this.

See the documentation of bnlearn.
predict function implements prob only for naive.bayes and TAN.
In short, because all other methods do not necessarily compute posterior probabilities.
[bnlearn] :: predict returns the predicted values for node given the data specified by data. Depending on the
value of method, the predicted values are computed as follows:
a)parents b)bayes-lw
When using bayes-lw , likelihood weighting simulations are performed for making predictions.
Hope this helps. :)

Related

how check overfitting on point pattern on a linear network using spatstat

I have been using lppm (point pattern on a linear network) on spatstat with bunch of covariates and fitting a log-linear model but I couldn't see how to check over-fitting. Is there a quick way to do it?
It depends on what you want.
What tool would you use to check overfitting in (say) a linear model?
To identify whether individual observations may have been over-fitted, you could use influence.lppm (from the spatstat.linnet package).
To identify collinearity in the covariates, currently we do not provide a dedicated function in spatstat, but you could use the following trick. If fit is your fitted model of class lppm, first extract the corresponding GLM using
g <- getglmfit(as.ppm(fit))
Next install the package faraway and use the vif function to calculate the variance inflation factors
library(faraway)
vif(g)

predicting with zoib models (MCMC / RJags)

I am using the zoib package in R to build zero-inflated beta regression models. I am looking for a simple way to use the models that zoib produces to calculate a predicted response for a new dataset. By "new dataset" I mean data not used to build the original zoib models.
I know I can just take the zoib model parameters and manually write a function in R to predict with but I want to utilise the fact that zoib models are Bayesian so I can get a posterior distribution of possible response values. My plan is to use the posterior distributions to calculate confidence intervals around each prediction.
Because zoib uses a MCMC approach within RJags I have investigated these two solutions:
manipulating the code within RJags
appending the new data with an "NA" response variable
The first solution I don't know how to implement because zoib runs RJags internally and the zero-inflated model it runs is very complicated. I tried the second solution but it just ignored the rows of data that I appended with "NA" response values.
I emailed the zoib package developers and this was there response.
For now, the zoib function can only output posterior predictive samples for Y given the X in the data set where the zoib regression is applied to, but not for a new set of X's. Your suggestion can be easily incorporated into the new version of the package, which is expected to be out in about a few weeks.

R CRAN Neural Network Package compute vs prediction

I am using R along with the neuralnet package see docs (https://cran.r-project.org/web/packages/neuralnet/neuralnet.pdf). I have used the neural network function to build and train my model.
Now I have built my model I want to test it on real data. Could someone explain if I should use the compute or prediction function? I have read the documentation and it isnt clear, both functions seem to do similar?
Thanks
The short answer is to use compute to do predictions.
You can see an example of using compute on the test set here. We can also see that compute is the right one from the documentation:
compute, a method for objects of class nn, typically produced by neuralnet. Computes the outputs
of all neurons for specific arbitrary covariate vectors given a trained neural network.
The above says that you can use covariate vectors in order to compute the output of the neural network i.e. make a prediction.
On the other hand prediction does what is mentioned in the title in the documentation:
Summarizes the output of the neural network, the data and the fitted
values of glm objects (if available)
Moreover, it only takes two arguments: the nn object and a list of glm models so there isn't a way to pass in the test set in order to make a prediction.

randomForest in R: Is there a possibility of calculating casewise confidence intervals?

R package randomForest reports mean squared errors for each tree in the forest. I need, however, a measure of confidence for each case in the data. Since randomForest calculates the casewise predictions by averaging the predictions of the single trees, I guess that it should also be possible to calculate a casewise standard error and thus a confidence interval. Can this be done using the output randomForest object (if so: how?) or do I have to dig into the source code?
No need to dig into the source code. You only need to read the documentation. ?predict.randomForest states that one of its arguments is called predict.all:
predict.all Should the predictions of all trees be kept?
So setting that to TRUE will keep a prediction for each case, for each tree, which you can then use to calculate standard error for each case.
I have recently been made aware of this paper by Stefan Wager, Trevor Hastie and Brad Efron which investigates more rigorously the idea of standard errors for the predictions generated by random forests (and other bagged predictors).

prediction intervals with caret

I've been using the caret package in R to run some boosted regression tree and random forest models and am hoping to generate prediction intervals for a set of new cases using the inbuilt cross-validation routine.
The trainControl function allows you to save the hold-out predictions at each of the n-folds, but I'm wondering whether unknown cases can also be predicted at each fold using the built-in functions, or whether I need to use a separate loop to build the models n-times.
Any advice much appreciated
Check the R package quantregForest, available at CRAN. It can easily calculate prediction intervals for random forest models. There's a nice paper by the author of the package, explaining the backgrounds of the method. (Sorry, I can't say anything about prediction intervals for BRT models; I'm looking for them by myself...)

Resources