Change weight threshold in bnlearn package - r

I am using the bnlearn package in r, which generates Bayesian networks using data. I am trying to get more connections between the data nodes, and hence, I am trying to decrease the weight threshold necessary to generate arcs between the nodes. I am using the gs function in the bnlearn package, which uses a grow-shrink algorithm. So far, I have tried modifying the alpha threshold, but that appears to change the threshold of error.
Ultimately, my goal is to have the algorithm create more arcs between the points.
Thanks

You might need to first find the weight of all arcs, and selectively filter them yourself. I don't think bnlearn has that built in.

Related

R package for survey raking that does automatic cell collapsing

I know there are various R packages for performing raking (i.e. calibration to external estimates, iterative proportional fitting, etc) to construct survey weights. I wanted to find a package that would automatically collapse cells if a cell count fell below a certain value. Is there a package out there with such a feature? Or if not raking exactly, a weighting package for a similar algorithm (e.g. GREG, entropy balancing) that would have such a feature for matchings to targets. Thank you.
Doing initial research, packages like "Ipfp: Multidimensional Iterative Proportional Fitting" didn't seems to have the feature I wanted.

Decisional boundary SVM caret (R)

I have built an SVM-RBF model in R using Caret. Is there a way of plotting the decisional boundary?
I know it is possible to do so by using other R packages but unfortunately I’m forced to use the Caret package because this is the only package I found that allows me to calculate the variables importance.
In alternative, can you suggest a package that allows to plot the decision boundaries AND gives also the vars importance?
Thank you very much
First of all, unlike other methods, SVM does not produce feature importance. In your case, the importance score caret reports is calculated independent of the method itself: https://topepo.github.io/caret/variable-importance.html#model-independent-metrics
Second, the decision boundary (or hyperplane) you see in most textbook example is based on a toy problem with only two or three features. If you have more than three features, it is not trivial to visualize this hyperplane.

Why is k-means clustering ignoring a significant patch of data?

I'm working with a set of co-ordinates, and want to dynamically (I have many sets that need to go through this process) understand how many distinct groups there are within the data. My approach was to apply k-means to investigate whether it would find the centroids and I could go from there.
When plotting some data with 6 distinct clusters (visually) the k-means algorithm continues to ignore two significant clusters while putting many centroids into another.
See image below:
Red are the co-ordinate data points and blue are centroids that k-means has provided. In this specific case I've gone for 15 (arbitrary), but it still doesn't recognise those patches of data on the right hand side, rather putting a mid point between them while putting in 8 in the cluster in the top right.
Admittedly there are slightly more data points in the top right, but not by much.
I'm using the standard k-means algorithm in R and just feeding in x and y co-ordinates. I've tried standardising the data, but this doesn't make any difference.
Any thoughts on why this is, or other potential methodologies that could be applied to try and dynamically understand the number of distinct clusters there are in the data?
You could try with Self-organizing map:
this is a clustering algorithm based on Neural Networks which create a discretized representation of the input space of the training samples, called a map, and is, therefore, a method to do dimensionality reduction (SOM).
This algorithm is very good for clustering also because does not require a priori selection of the number of clusters (in k-mean you need to choose k, here no). In your case, it hopefully finds automatically the optimal number of cluster, and you can actually visualize it.
You can find a very nice python package called somoclu which has got this algorithm implemented and an easy way to visualize the result. Else you can go with R. Here you can find a blog post with a tutorial, and Cran package manual for SOM.
K-means is a randomized algorithm and it will get stuck in local minima.
Because of these problems, it is common to run k-means several times, and keep the result with least squares, I.e., the best of the local minima found.

How to set the resolution parameter for Louvain modularity in igraph?

is there a way to set the resolution parameter when using the function cluster_louvain to detect communities in igraph for R? It makes a lot of difference for the result, as this parameter is related to the hierarchical dissimilarity between nodes. Thank you.
The easiest way to do it is through the resolution package, available in this link https://github.com/analyxcompany/resolution
It is based on this paper http://arxiv.org/pdf/0812.1770.pdf
It pretty much has 2 functions cluster_resolution() and cluster_resolution_RandomOrderFULL().
In both you can state the resolution t and how many repetitions you want rep. And, you can just use the igraph object in the function.
cluster_resolution_RandomOrderFULL(g,t=0.5)
cluster_resolution_RandomOrderFULL(g,rep=20)
NOTE/EDIT: it will not accept signed networks though! I'm trying to either contact the owner of the code or costumize it myself to make it suitable for signed networks.
EDIT2: I was able to translate the function community_louvain.m from the Brain Connectivity Toolbox for Matlab to R.
Here is the github link for the signed_louvain()
you can pretty much just put for ex. signed_louvain(g, gamma = 1, mod = 'modularity')
it works with igraph or matrix objects as input. If it has negative values, you have to choose mod = 'neg_sym' or 'neg_asym'.

correlation matrix to build networks

I have used the MixOmics package in R for a two matrices (canonical correlation analysis) and I have a resultant correlation matrix. I would like to build a correlation network from the result obtained. I earlier thought of using the gene set correlation analysis package but I do not know how to install it and there are no sources over the internet to install it in R (http://www.biostat.wisc.edu/~kendzior/GSCA/).
Also could you suggest what other packages I could use to build networks with correlation matrix as input ?? I thought of Rgraphviz but do not know if it is possible.
Copying this answer mostly from my previous answer at https://stackoverflow.com/a/7600901/567015
The qgraph package is mostly intended to visualize correlation matrices as a network. This will plot variables as nodes and correlations as edges connecting the nodes. Green edges indicate positive correlations and red edges indicate negative correlations. The wider and more saturated the edges the stronger the absolute correlation.
For example (this is the first example from the help page), the following code will plot the correlation matrix of a 240 variable dataset.
library("qgraph")
data(big5)
data(big5groups)
qgraph(cor(big5),minimum=0.25,cut=0.4,vsize=2,groups=big5groups,legend=TRUE,borders=FALSE)
title("Big 5 correlations",line=-2,cex.main=2)
You can also cluster strongly correlated nodes together (uses Fruchterman-Reingold) which creates quite a clear image of what the structure of your correlation matrix actually looks like:
For an extensive introduction take a look at http://www.jstatsoft.org/v48/i04/paper
You might also want to take a look at the network and sna packages on CRAN. Both include tools for converting a matrix into a network data object.

Resources