I have been looking into this for a while now and I cannot seem to find an integrated approach for this. I have some discrete and some continuous variables ( which can be discretized, but ideally I would like to keep them continuous).
I know there is other (some open source) options, however I would like to stick to R, as I am planning on doing statistical analysis.
QUESTION: Is there a package (other than bnspatial) in R to build Bayesian networks with spatial data? Ideally with an comprehensive example.
Related
I'm looking for advice on creating classification trees where each split is based on multiple variables. A bit of background: I'm helping design a vegetation classification system, and we're hoping to use a classification and regression tree algorithm to both classify new veg data and create (or at least help to create) visual keys which can be used in publications. The data I'm using is laid out as community data, with tree species as columns, and observations as rows, and the first column is a factor with classes. I'll also add that I'm very new to this type of analysis, and while I've tried to read about it as much as possible, it's quite likely that I've missed some simple but important aspects. My apologies.
Now the problem: R has excellent packages and great documentation for classification with univariate splits (e.g. rpart, partykit, C5.0). However, I would ideally like to be able to create classification trees where each split was based on multiple criteria - so instead of each split having one decision (e.g. "Percent cover of Species A > 6.67"), it would have multiple (Percent cover of Species A > 6.67 AND Percent cover of Species B < 4.2). I've had a lot of trouble finding packages that are capable of doing multivariate splits and creating trees. This answer: https://stats.stackexchange.com/questions/4356/does-rpart-use-multivariate-splits-by-default has been very useful, and I've tried all the packages suggested there for multivariate splitting. Prim does do multivariate splits, but doesn't seem to make trees; the partDSA package seems to be somewhat what I'm looking for, but it also only creates trees with one criteria per split; the optpart package also doesn't seem to be able to make classification trees. If anyone has advice on how I could go about making a classification tree based on a multivariate partitioning method, that would be super appreciated.
Also, this is my first question, and I am very open to suggestions about how to ask questions. I didn't feel that providing an example would be helpful in this case, but if necessary I easily can.
Many Thanks!
Like in decision tree we can see or visualize the node splits , I want to do something similar . But I am using SparkR and it does not have decision trees. So I am planning to use random forest with 1 tree as parameter and run on SparkR, then save the model and use getTree to see the node splits and further visualize using ggplot.
The short answer is no.
Models built with SparkR are not compatible with ones built with the respective R packages, in this case randomForest; hence, you will not be able to use the getTree function from the latter to visualize a tree from a random forest built with SparkR.
On a different level: I am surprised that decision trees have still not found their way into SparkR - they seem to be ready since several months now in the Github repo; but even when they are, they are not expected to offer methods for visualizing trees, and you will still not be able to use functions from other R packages for that purpose.
When it comes to network analysis in R, I am a relatively familiar with igraph but not at all with sna.
My question are:
Are these two libraries compatible? i.e. Can I apply an operation from sna to a graph created in igraph and vice versa?
Are there tasks that are performed more efficiently in one package
than in the another?
Which library has a more comprehensible range of operations?
Overall, are there any strong reasons to do network analysis in R using either igraph or sna ?
ps. Does any of the these packages allow for multilayer (multiplex) network analysis?
My sort of big picture take on the differences between the two packages is that igraph is more geared toward graph theory and mathematical models of networks and sna is more geared toward statistical models of (primarily social) networks. The creators of igraph (I think) mostly have a background in computer sciences, while the sna people are sociologists and statisticians. I primarily work in sna (and related packages that comprise the statnet suite of packages -- I am in the social sciences), but use igraph often as well, sometimes within the same script.
To answer your specific questions:
1) No, they are not. Many of the functions in igraph have the same name in sna and this causes conflicts. An igraph graph cannot be used in an sna function. The package intergraph was created to make it easy to switch between sna and igraph. So I could send an igraph graph to an sna function by passing to to intergraph first -- e.g. sna::evcent(intergraph::asNetwork(g)), assuming g is an igraph network. If you use both together in a script, you need to specifically call out the package when running a function or load and unload as needed.
2) In my experience, I have not found one to be more efficient than the other. Both are well developed and maintained packages. I believe that igraph is a bit better suited for large graphs--it has some functions that are modified to save on computational time when run on large graphs. But I do not have direct experience here. Although I would say that igraph is generally better at visualizations.
3) I would say that neither has an edge in comprehensiveness. Both do all the main network analysis stuff (centrality, network topology). They differ in their more "advanced" features. See my general point--they are geared to overlapping but distinct issues in network analysis. There's a lot of stuff in sna that is not available in igraph (e.g. related to statistical inference, like QAP regression [netlm / netlogit] or network autocorrelation models [lnam]), and vice versa (community detection functions like cluster_fast_greedy, for example). sna is extended by a number of compatible packages that do things like latent space models and exponential random graph models.
4) Ceterius paribus, no. To me, the choice is primarily needs driven. If you are interested in statistical inference, you need to work in sna. If not, igraph generally serves. Based on the questions at stack overflow, igraph seems to be more popular, but that of course could be due to selection bias. For that reason alone, if I didn't need to statistically model networks, I would probably mostly use igraph. Again, both packages are great, serving overlapping, but slightly different needs.
Not sure what you mean by "multilayer network analysis" but both igraph and sna work with multiplex networks. You can certainly analyze multiplex network and multilevel networks in sna. (Here, multiplex meaning a networks with a variety of tie types (e.g. friendship and advice) and multilevel meaning either nested networks or multiple networks from the same population (the terminology is a bit confused at this point).) It depends on what you want to do, and often takes some wrangling, but it is possible to an extent.
I am working on a small project in survival analysis and making use of the paper:
Lagakos, De Gruttolla - Analysis of Doubly Censored Data, with applications to AIDS.
Before implementing their code in R, I wanted to know if there were already packages that implement the method they describe. I am aware of the package dblcens however, it's functions use an EM algorithm rather than the algorithm described by Lagakos and De Gruttolla.
Has this algorithm already been implemented in some R package?
Thank you.
I would like to fit an LSTM model using MXNET in R for the purpose of predicting a continuous response (i.e., regression) given several continuous predictors. However, the mx.lstm() function seems to be geared toward NLP as it requires arguments which don't seem applicable to a regression problem (such as those related to embedding).
Is MXNET capable of this sort of modeling and, if not, what is an example of an appropriate tool (preferably in R)? Are there any tutorials relevant to the problem I've described?
LSTM is used for working with temporal data: text, speech, time series. If you want to predict a continuous response, then I assume you want to do something similar to time series analysis.
If my assumption is correct, then, please, take a look here. It gives quite a good example on how to use MxNet with R for time series on CPU. The GPU version is also available here.