I am applying hclust for hierarchical clustering in R and poLCA for latent class analysis on a data set.
I need to make a comparison on both clustering method analysis by numerical and pictorial representation.
Can someone suggest the method to check homogeneity and heterogeneity of above clustering methods?
There is a package called clValid that may help you.
Here is the paper: https://cran.r-project.org/web/packages/clValid/vignettes/clValid.pdf
Here is the r documentation: https://cran.r-project.org/web/packages/clValid/clValid.pdf
Related
just a basic question concerning k-means clustering analysis on survival data, like this one:
I am doing k-means clustering to identify clusters which Gene influences the survival most... However do I include the survival time into my k-means function or should I leave it out? So should I put it into the kmeans() function e.g. in R?
Kind regards,
Hashriama
I think that your approach is not the best one. Your goal is to select genes associated with censored/uncensored survival. The use of supervised methods seems the most suitable. Using a k-means will only cluster genes by similarities without regard to survival, and even if you wanted to add survival in your modeling it would not make sense because you are omitting censoring.
There are Cox regressions to which an L1 penalty is added, allowing variable selection without omitting censoring. This kind of approach seems more appropriate to accomplish your goal and fits better in your context. To learn more, here is an article from Jiang Gui & Hongzhe Li that uses penalized Cox regression (look at the R package biospear too if needed):
https://academic.oup.com/bioinformatics/article/21/13/3001/196819
I am looking for R packages for hierarchical clustering (or other clustering methods) that can handle mixed data types. I have a data set with continuous and ordinal variables.
Any recommendations are greatly appreciated.
I am using FactoMineR. This can handle mixed data easily and is well documented.
Damien
There is an implementation of AdaBoosting algorithm in R. See this link, the function is called boosting.
The problem is that this package uses classification trees as a base or weak learner.
Is that possible to substitute the original weak learner to any other (e.g., SVM or Neural Networks) using this package?
If not, are there any examples of AdaBoosting implementation in R?
Many thanks!
I'm experimenting with Bayesian networks in R and have built some networks using the bnlearn package. I can use them to make predictions for new observations with predict(), however I would also like to have the posterior distribution over the possible classes. Is there a way of retrieving this information?
It seems like there is a prob-parameter that does this for the naive bayes implementation in the bnlearn package, but not for networks fitted with bn.fit.
Thankful for any help with this.
See the documentation of bnlearn.
predict function implements prob only for naive.bayes and TAN.
In short, because all other methods do not necessarily compute posterior probabilities.
[bnlearn] :: predict returns the predicted values for node given the data specified by data. Depending on the
value of method, the predicted values are computed as follows:
a)parents b)bayes-lw
When using bayes-lw , likelihood weighting simulations are performed for making predictions.
Hope this helps. :)
I've been using the caret package in R to run some boosted regression tree and random forest models and am hoping to generate prediction intervals for a set of new cases using the inbuilt cross-validation routine.
The trainControl function allows you to save the hold-out predictions at each of the n-folds, but I'm wondering whether unknown cases can also be predicted at each fold using the built-in functions, or whether I need to use a separate loop to build the models n-times.
Any advice much appreciated
Check the R package quantregForest, available at CRAN. It can easily calculate prediction intervals for random forest models. There's a nice paper by the author of the package, explaining the backgrounds of the method. (Sorry, I can't say anything about prediction intervals for BRT models; I'm looking for them by myself...)