how to perform semi-supervised k-mean clustering - r

I am new in r. I am trying to perform semi-supervised k-means clustering. I plan to divide my 2/3 of my data as a training set, and 1/3 as a test set. My objective is to train a model using the known clusters, and then propagate the training model to the test set. the propagation result will be compare with the prior clusters. my objective is to check the prediction accuracy of kmeans clustering. Therefore I am wondering if there is a way we can do semi-supervised kmeans clustering using r? any package is needed. thank you.
thank you
regards,

Use kmeans(). It should come with the stats package, which you should have if you've installed R correctly. You can read how to use functions by putting a ? before the function call, e.g. ?kmeans().
Search online if you're still lost about how to use the function - there are plenty of guides and toy examples online.
M

Related

How can one calculate ROC's AUCs in complex designs with clustering in R?

The packages that calculate AUCs I've found so far do not contemplate sample clustering, which increases standard errors compared to simple random sampling. I wonder if the ones provided by these packages could be recalculated to allow for clustering.
Thank you.
Your best bet is probably replicate weights, as long as you can get point estimates of AUC that incorporate weights.
If you convert your design into a replicate-weights design object (using survey::as.svrepdesign()), you can then run any R function or expression using the replicate weights using survey::withReplicates() and return a standard error.

Training Hidden Markov Model in R

Is it possible to train Hidden Markov Model in R?
I have a set of observations with its corresponding labels. And I need to train HMM in order to get the Markov parameters (i.e. the transition probabilities matrix, emission probabilities matrix and initial distribution). So, I can predict for the future observations.
In other words, I need the opposite of Forward_Backward Algorithm..
Yes, you can. R is a good tool for simulation and statistical analysis. There are many nice packages available. You do not need to implement at all (of course you can), just learn to use them.
An example of using package.
An example of implementing HMM is here. Here DNA sequence is modeled using HMM.
Similar question is asked here as well.

Has anyone used RNN package in R for Recursive Neural Network? How do I use that for prediction?

rnn() function R has no return statement. It generates synapses for input, hidden and output layer. How to use these for prediction with a test sample of a time series data?
There was just an update of the rnn package, with the version 0.5.0, it can generalize outside of the toy example of binary addition.
You must use the trainr function to train the model and the predictr to predict the values on your data.
So far it only support synchronized many to many learning, understand that each new time point input will produce an output.

R - Party package: is cforest really bagging?

I'm using the "party" package to create random forest of regression trees.
I've created a ForestControl class in order to limit my number of trees (ntree), of nodes (maxdepth) and of variables I use to fit a tree (mtry).
One thing I'm not sure of is if the cforest algo is using subsets of my training set for each tree it generates or not.
I've seen in the documentation that it is bagging so I assume it should. But I'm not sure to understand well what the "subset" input is in that function.
I'm also puzzled by the results I get using ctree: when plotting the tree, I see that all my variables of my training set are classified in the different terminal tree nodes while I would have exepected that it only uses a subset here too.
So my question is, is cforest doing the same thing as ctree or is it really bagging my training set?
Thanks in advance for you help!
Ben

Predict in Clustering

In R language is there a predict function in clustering like the way we have in classification?
What can we conclude from the clustering graph result that we get from R, other that comparing two clusters?
Clustering does not pay attention to prediction capabilities. It just tries to find objects that seem to be related. That is why there is no "predict" function for clustering results.
However, in many situations, learning classifiers based on the clusters offers an improved performance. For this, you essentially train a classifier to assign the object to the appropriate cluster, then classify it using a classifier trained only on examples from this cluster. When the cluster is pure, you can even skip this second step.
The reason is the following: there may be multiple types that are classified with the same label. Training a classifier on the full data set may be hard, because it will try to learn both clusters at the same time. Splitting the class into two groups, and training a separate classifier for each, can make the task significantly easier.
Many packages offer predict methods for cluster object. One of such examples is clue, with cl_predict.
The best practice when doing this is applying the same rules used to cluster training data. For example, in Kernel K-Means you should compute the kernel distance between your data point and the cluster centers. The minimum determines cluster assignment (see here for example). In Spectral Clustering you should project your data point dissimilarity into the eigenfunctions of the training data, compare the euclidean distance to K-Means centers in that space, and a minimum should determine your cluster assignment (see here for example).

Resources