Does bnlearn compute for non-linear correlations? - r

I've recently started using the bnlearn package in R for detecting Markov Blankets of a target node in the dataset.
Based on my understanding of Bayesian Inference, two nodes are connected if there is a causal relationship between the two and this is measured using some conditional independence tests to check for correlation while taking into account potential confounders.
I just wanted to clarify if bnlearn checks for both linear and non-linear correlations in these tests. I tried looking for stuff about this in the documentation for the package but I wasn't able to get anything.
It would be really helpful if someone can explain how bnlearn performs the CI tests.
Thanks a bunch <3

Correlation implies statistical dependence, but not vice versa. There are cases of statistical dependence where there is no correlation, e.g. in periodic signals (correlation between sin(x) and x is very low for many periods). The concept of statistical dependence is more abstract than correlation and thus the documentation is written differently.
As you can see in the example of sin(x) and x: This is indeed a non-linear dependency which should be captured by the Bayesian network.

Related

Comparison of Different Types of Nonlinear Regression Models

Thank you for seeing this post.
Various regression models are being applied to the curve estimating (actual measured ventilation rate).
Comparison was made using the GLM and GAM models including polynomial regression. I use R.
Are there any other types of regression models that can simulate that curve well?
like...bayesian? (In fact, I didn't even understand if it could be applied here)
Sincerely.
loads of non linear methods exist, and many would give similar results, but this is a statistics question. it would fit better on cross validated. also, you have to tell: do you want to do interpolation, even extrapolation? what is your analysis for?
bayesian methods are used when you have knowledge of the phenomenon prior to data, or in some cases when you want to apply regularization or graphical models to data generation processes, I think you should better leave out bayesian statistics here.
edit:
to be short: if you want to obtain a readable formulation of the curve, and you don't have any specific mathematical model in mind, go for a polynomial fit. Other popular choices, which are better for only plotting the curve, instead of reporting it in a mathematical expression, are smoothing splines and LOESS. for further details, maybe ask a new question on stats.stackexchange.com, after studing better the alternatives.

R CRAN Neural Network Package compute vs prediction

I am using R along with the neuralnet package see docs (https://cran.r-project.org/web/packages/neuralnet/neuralnet.pdf). I have used the neural network function to build and train my model.
Now I have built my model I want to test it on real data. Could someone explain if I should use the compute or prediction function? I have read the documentation and it isnt clear, both functions seem to do similar?
Thanks
The short answer is to use compute to do predictions.
You can see an example of using compute on the test set here. We can also see that compute is the right one from the documentation:
compute, a method for objects of class nn, typically produced by neuralnet. Computes the outputs
of all neurons for specific arbitrary covariate vectors given a trained neural network.
The above says that you can use covariate vectors in order to compute the output of the neural network i.e. make a prediction.
On the other hand prediction does what is mentioned in the title in the documentation:
Summarizes the output of the neural network, the data and the fitted
values of glm objects (if available)
Moreover, it only takes two arguments: the nn object and a list of glm models so there isn't a way to pass in the test set in order to make a prediction.

Quantifying importance of variables in Canonical Correspondence Analysis using R? (x-post from researchgate)

I currently have species abundance data for multiple lakes along with measurements of some environmental variables of those lakes. I decided to do Canonical Correspondence Analysis of the data in R, as demonstrated by ter Braak and Verdenschot (1995), see link: http://link.springer.com/article/10.1007%2FBF00877430 (section: "Ranking environmental variables in importance")
I am not very good with R yet, and I do not have access to the software specified in the article (CANOCO). My problem is, in order to do stepwise ranking of importance of environmental variables, I have to obtain the Lambda (Is this the same as Wilk's Lambda?) and perform a Monte Carlo permutation test on each CCA constrained axis. 
Does anybody know how I can do this in R? I would love to be able to use this analysis.
You want the anova() method that vegan provides for cca(), the function that does CCA in the package, if you want to test effects in a current model. See ?anova.cca for details and perhaps the by = "margin" option to test marginal terms.
To do stepwise selection you have two options
Use the standard step() function that works with an AIC-like statistic for CCA, or
For the sort of selection done in that paper and implemented in CANOCO you want ordistep(). This does forward selection & backward elimination testing changes to models via permutation tests.
Lambda is often used to indicate Eigenvalues and is not Wilk's Lambda. The pseudo-F statistic will be mentioned in the paper and it is this that is computed in the test and for which permutations give the sampling distribution under the null hypothesis, which ultimately determines significance of terms in the model or whether a term enters or leaves the model.
See ?ordistep for more details.

Output posterior distribution from bayesian network in R (bnlearn)

I'm experimenting with Bayesian networks in R and have built some networks using the bnlearn package. I can use them to make predictions for new observations with predict(), however I would also like to have the posterior distribution over the possible classes. Is there a way of retrieving this information?
It seems like there is a prob-parameter that does this for the naive bayes implementation in the bnlearn package, but not for networks fitted with bn.fit.
Thankful for any help with this.
See the documentation of bnlearn.
predict function implements prob only for naive.bayes and TAN.
In short, because all other methods do not necessarily compute posterior probabilities.
[bnlearn] :: predict returns the predicted values for node given the data specified by data. Depending on the
value of method, the predicted values are computed as follows:
a)parents b)bayes-lw
When using bayes-lw , likelihood weighting simulations are performed for making predictions.
Hope this helps. :)

What R packages are available for binary data that is both correlated and clustered?

I'm working on a project now that's rather unlike anything I've done before. I have two tests with binary results that will be administered to the same sample, which is drawn from a clustered population (i.e., some subjects will be from the same family). I'd like to compare proportions of positive test results, but the clustering makes McNemar's test inappropriate so I've been reading up on alternative approaches. The two main routes seem to be 1) the clustering-adjusted McNemar alternatives by Rao and Scott (1992), Eliasziw and Donner (1991), and Obuchowski (1998), and 2) GEE.
Do you know of any implementations of the Rao-Obuchowski lineage in R (or, I suppose, SAS)? GEE is easy to find, but have you had a positive or negative experience with any particular packages? Is there another route to analyzing these data that I'm completely missing?
You could always just use a clustered bootstrap. Resample across families, which you believe are independent. That is, keep families together when you resample. Compute p2 - p1 for each sample. After 1000 iterations or so, compute the upper and bottom 2.5% quantiles. This will give you a bootstrapped 95% confidence interval. Alternatively compute the fraction of samples above zero, or whatever your hypothesis is. The procedure should have good pretty good properties unless the number of families is small.
It's probably easiest to do this by hand in R rather than relying on any package.
Check out the survey package: it is designed to take into account correlations induced by clustered sampling.
Have you already checked the CorrBin package in R?
It is for analysis of correlated binary data, there is a paper named: Using the CorrBin package for nonparametric analysis of
correlated binary data by Szabo, it includes the Rao-Scott, stochastic ordering and three versions of a GEE-based test.
The clust.bin.pair package for clustered binary matched-pair data was recently published to CRAN.
It contains implementations of Eliasziw and Donner (1991) and Obuchowski (1998), as well as two more recent tests in the same family Durkalski (2003) and Yang (2010).

Resources