I am looking for R packages for hierarchical clustering (or other clustering methods) that can handle mixed data types. I have a data set with continuous and ordinal variables.
Any recommendations are greatly appreciated.
I am using FactoMineR. This can handle mixed data easily and is well documented.
Damien
Related
Good day,
for presentation purposes I would like to plot a couple of decision trees from a random forest (with about 100 trees). I found a post from last year where its clear is not really possible or there is not an function using tidymodels. R: Tidymodels: Is it possible to plot the trees for a random forest model in tidy models?
I´m wondering if somebody has found a way! I remember I could easily do this using the "Caret" package, but tidymodels makes everything so convenient I was hoping for someone with a solution.
Many thanks!
Summarizing what trees can be ploted with tidymodels based in comments comments and other Stackoverflow posts
Decision trees. There are some options but the function rpart.plot()seems to be the most popular.
Individual tree from a random forest. Doesn´t seem to be possible to plot one (yet) using the tidymodel environment. See this post: here
XGBoost models: See Julia comment:
You should be able to use a function like xgb.plot.tree() with a
trained tidymodels workflow or parsnip model by extracting out the
underlying object created with the xgboost engine. You can do this
with extract_fit_engine()
I am applying hclust for hierarchical clustering in R and poLCA for latent class analysis on a data set.
I need to make a comparison on both clustering method analysis by numerical and pictorial representation.
Can someone suggest the method to check homogeneity and heterogeneity of above clustering methods?
There is a package called clValid that may help you.
Here is the paper: https://cran.r-project.org/web/packages/clValid/vignettes/clValid.pdf
Here is the r documentation: https://cran.r-project.org/web/packages/clValid/clValid.pdf
I am using the Package ‘arules’ to mine frequent itemsets in my big data, but I cannot find suitable methods for discretization.
As the example in Package ‘arules’, several basic unsupervised methods can be used in the function ‘discretization’, but I want to estimate optimal number of categories in my large dataset, it seems more reasonable than assigning the number of categories.
Can you give me good advices for this, thanks.
#Michael Hahsler
I think there is little guidance on this for unsupervised discretization. Look at the histogram for each variable and decide manually. For k-means you could potentially use strategies to find k using internal validation techniques (i.e., elbow method). For supervised discretization there exist methods that will help you decide. Maybe someone else can help here.
I want to build a Bagged Logistic Regression Model in R. My dataset is really biased and has 0.007% of positive occurrences.
My thoughts to solve this was to use Bagged Logistic Regression. I came across the hybridEnsemble package in R. Does anyone have an example of how this package can be used? I searched online, but unfortunately did not find any examples.
Any help will be appreciated.
The way that I would try to solve this is use the h2o.stackedEnsemble() function in the h2o R package. You can automatically create more balanced classifiers by using the balance_classes = TRUE option in all of the base learners. More information about how to use this function to create ensembles is located in the Stacked Ensemble H2O docs.
Also, using H2O will be way faster than anything that's written in native R.
I am new in r. I am trying to perform semi-supervised k-means clustering. I plan to divide my 2/3 of my data as a training set, and 1/3 as a test set. My objective is to train a model using the known clusters, and then propagate the training model to the test set. the propagation result will be compare with the prior clusters. my objective is to check the prediction accuracy of kmeans clustering. Therefore I am wondering if there is a way we can do semi-supervised kmeans clustering using r? any package is needed. thank you.
thank you
regards,
Use kmeans(). It should come with the stats package, which you should have if you've installed R correctly. You can read how to use functions by putting a ? before the function call, e.g. ?kmeans().
Search online if you're still lost about how to use the function - there are plenty of guides and toy examples online.
M