how to create a cclust object given a dataframe of indices - r

I need to access a function clustIndex of cclust package in R.
The protoptype of the function is as follows:
clustIndex ( y, x, index = "all" )
y Object of class "cclust" returned by a clustering algorithm such as kmeans
x Data matrix where columns correspond to variables and rows to observations
index The indexes that are calculated "calinski", "cindex", "db", "hartigan",
"ratkowsky", "scott", "marriot", "ball", "trcovw", "tracew", "friedman",
"rubin", "ssi", "likelihood", and "all" for all the indexes. Abbreviations
of these names are also accepted.
y is the object that is produced from function cclust in the same package, but I have a clustering algorithm coded in Matlab, and want to use this function clustIndex to calculate the indices using the solution produced by the algorithm in matlab.
One way I can think of is to create an object of cclust and fill value of its variable using my solutuion and then use it. Will this be correct/work?
Documentation of the package is available here
Any other ideas to use?

No need to create an object , you can just create a list like this:
y = list(cluster = matlabObj$cluster ,
centers = matlabObj$centers ,
withins = matlabObj$withins,
size = matlabObj$size)
Here an example using cclust(you should use your matlab cluster here) to show that the 4 variables are enough to use clustIndex function:
x<- rbind(matrix(rnorm(100,sd=0.3),ncol=2),
matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
matlabObj <- cclust(x,2,20,verbose=TRUE,method="kmeans")
clustIndex(matlabObj,x, index="all")
y = list(cluster = matlabObj$cluster ,
centers = matlabObj$centers ,
withins = matlabObj$withins,
size = matlabObj$size)
identical(clustIndex(y,x, index="all"),
clustIndex(matlabObj,x, index="all"))
[1] TRUE

Related

How to create a graph file for INLA using region names

i.e. use the region.id of class nb from the spdep package rather than ignoring it as spdep::nb2INLA does?
I've been trying to link a column in my data containing these regions as a factor, to an INLA model with a graph describing their spatial arrangement.
#something like this
f(rgn16cd,
model = "bym2",
graph = inla_graphs$gb_regions)
It works if I coerce rgn16cd from factor to numeric. Is there a way to get the region names into the graph file?
Where nbs is a list of class nb, made using an spatial polygons object with row.names given values from a column of the #data slot of the spatial polygons object.
This code should return a graph with named element as shown.
inla_graphs <- purrr::imap(nbs, ~ {
spdep::nb2INLA(file = glue::glue("{.y}.graph"), nb = .x$nb)
x <- INLA::inla.read.graph(glue::glue("{.y}.graph"))
x$nbs <- lapply(x$nbs, FUN = function(X) {
row.names(.x$mat)[X]
})
names(x$nbs) <- row.names(.x$mat)
unlink(glue::glue("{.y}.graph"))
x
})

Change title: mcmc_trace function with ggplot

I used mcmc_trace function from the bayesplot package to plot traceplot with mcmc list, which is a ggplot item so it can be further edited by ggplot function.
Follows is the plot that produced by the function. I needed to change the title k1...k[20] to subject 1... subject 20. Are there any approaches I can achieve this with ggplot function?
Follows is a simple reproducible model.
library (r2jags)
library (bayesplot)
library (ggplot2)
# data
dlist <- list(
NSubjects = 20,
k = rep (5,20),
n = rep (10,20)
)
# monitor
parameter <- 'theta'
# model
minimodel <- function(){
for (i in 1:NSubjects){
theta [i] ~ dbeta (1,1)
k[i] ~ dbin(theta[i],n[i])
}
}
samples <- jags(dlist, inits=NULL, parameter,
model.file = minimodel,
n.chains=1, n.iter=10, n.burnin=1, n.thin=1, DIC=T)
# mcmc list
codaSamples = as.mcmc.list(samples$BUGSoutput)
# select subjects
colstheta <- sprintf("theta[%d]",1:20)
# plot (here is where I need to change title, in this example: theta[1]...theta[20] to subject [1].. subject [20]
mcmc_trace(codaSamples[,colstheta]) +
labs (x='Iteration',y='theta value',
title='Traceplot - theta')
Use colnames<- to modify the column names. Since the object is a 1-element list containing a matrix-like object, you need to use [[1]]; if you have multiple chains you'll need to lapply() (or use a for loop) to apply the solution to every chain (i.e., every element in the list).
cc <- codaSamples[,colstheta]
colnames(cc[[1]]) <- gsub("theta\\[([0-9]+)\\]","subject \\1",colnames(cc[[1]]))
mcmc_trace(cc, ...)
The code above finds the numerical element in each name and inserts it into the new name; since you happen to know in this case that these are elements 1:20, you could simplify considerably, e.g.
colnames(cc[[1]]) <- paste("subject",seq(ncol(cc[[1]])))

R dtw package: query and reference vectors for binary data to pass it to dtw function

I have two time series tables that look like this.
I calculated the binary column to count for specific categorical value, if this value = “x” then assign 1, else assign 0. I graphed this using ggplot like this,
p <-ggplot(x1,aes(Time, binary))
p + geom_line()+
xlab("Time in seconds (s)")+
scale_y_continuous(name="x = 1, anything else = 0", breaks=c(0, 1))+
labs(title = "Example of the duration")
I got exactly what I wanted,
I did the same for the second time series and I got this graph,
Now it is time to use dtw function to calculate the distance. I am not sure how to store this binary data into carry or matrix to pass it through dtw function here,
dtw(
x,
y = NULL,
dist.method = "Euclidean",
step.pattern = symmetric2,
window.type = "none",
keep.internals = FALSE,
distance.only = FALSE,
open.end = FALSE,
open.begin = FALSE,
...
)
where
x is the query vector or local cost matrix and y is reference vector, or NULL if x given as a local cost matrix
What I did is this,
i<-c(x1$binary)
j<-c(x2$binary)
dtw1 <-dtw(i, j, dist.method="Euclidean", keep.internals = T, step.pattern= symmetric)
plot(dtw1)
But this is not correct. The graphs for each one is not the same as shown below. The matrix cost is null. It only calculates the number of 0,1 of each column. I know this is not correct, but I don’t know how to get the query and reference vectors to calculate the dtw. How to apply that for this binary data?
What I did is this, instead of assigning the binary values to the dtw function, I used time.
i<-c(x1$Time)
j<-c(x2$Time)
dtw1 <-dtw(i, j, dist.method="Euclidean", keep.internals = T, step.pattern= asymmetric)
plot(dtw1)

How to reorder cluster leaves (columns) when plotting pheatmap in R?

I am plotting a set of 15 samples clustered in three groups A, B, C, and the heatmap orders them such as C, A, B. (I have read this is due to that it plots on the right the cluster with the strongest similarity). I would like to order the clusters so the leaves of the cluster are seen as A, B, C (therefore reorganising the order of the cluster branches. Is there a function that can help me do this?
The code I have used:
library(pheatmap)
pheatmap(mat, annotation_col = anno,
color = colorRampPalette(c("blue", "white", "red"))(50), show_rownames = F)
(cluster_cols=FALSE would not cluster the samples at all, but that is not what I want)
I have also found on another forum this, but I am unsure how to change the function code and if it would work for me:
clustering_callback callback function to modify the clustering. Is
called with two parameters: original hclust object and the matrix used
for clustering. Must return a hclust object.
Hi I am not sure if that is of any help for you but when you check?pheatmap and scroll down to examples the last snippet of code actually does give that example.
# Modify ordering of the clusters using clustering callback option
callback = function(hc, mat){
sv = svd(t(mat))$v[,1]
dend = reorder(as.dendrogram(hc), wts = sv)
as.hclust(dend)
}
pheatmap(test, clustering_callback = callback)
I tried it on my heatmap and the previously defined function actually sorted the clusters exactly the way I needed them. Although I have to admit (as I am new to R) I don't fully understand what the defined callback function does.
Maybe you can also write a function with the dendsortpackage as I know you can reorder the branches of a dendrogram with it.
In this case, luckily clustering of the columns coincides with sample number order, (which is similar to dendrogram) so I added cluster_cols = FALSE and solved the issue of re-clustering the columns (and avoided writing the callback function.
pheatmap(mat,
annotation_col = anno,
fontsize_row = 2,
show_rownames = T,
cutree_rows = 3,
cluster_cols = FALSE)
# install.packages("dendsort")
library(dendsort)
sort_hclust <- function(...) as.hclust(dendsort(as.dendrogram(...)))
cluster_cols=sort_hclust(hclust(dist(mat)))

Displaying TraMineR (R) dendrograms in text/table format

I use the following R code to generate a dendrogram (see attached picture) with labels based on TraMineR sequences:
library(TraMineR)
library(cluster)
clusterward <- agnes(twitter.om, diss = TRUE, method = "ward")
plot(clusterward, which.plots = 2, labels=colnames(twitter_sequences))
The full code (including dataset) can be found here.
As informative as the dendrogram is graphically, it would be handy to get the same information in text and/or table format. If I call any of the aspects of the object clusterward (created by agnes), such as "order" or "merge" I get everything labeled using numbers rather than the names I get from colnames(twitter_sequences). Also, I don't see how I can output the groupings represented graphically in the dendrogram.
To summarize: How can I get the cluster output in text/table format with the labels properly displayed using R and ideally the traminer/cluster libraries?
The question concerns the cluster package. The help page for the agnes.object returned by agnes
(See http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/agnes.object.html ) states that this object contains an order.lab component "similar to order, but containing observation labels instead of observation numbers. This component is only available if the original observations were labelled."
The dissimilarity matrix (twitter.om in your case) produced by TraMineR does currently not retain the sequence labels as row and column names. To get the order.lab component you have to manually assign sequence labels as both the rownames and colnames of your twitter.om matrix. I illustrate here with the mvad data provided by the TraMineR package.
library(TraMineR)
data(mvad)
## attaching row labels
rownames(mvad) <- paste("seq",rownames(mvad),sep="")
mvad.seq <- seqdef(mvad[17:86])
## computing the dissimilarity matrix
dist.om <- seqdist(mvad.seq, method = "OM", indel = 1, sm = "TRATE")
## assigning row and column labels
rownames(dist.om) <- rownames(mvad)
colnames(dist.om) <- rownames(mvad)
dist.om[1:6,1:6]
## Hierarchical cluster with agnes library(cluster)
cward <- agnes(dist.om, diss = TRUE, method = "ward")
## here we can see that cward has an order.lab component
attributes(cward)
That is for getting order with sequence labels rather than numbers. But now it is not clear to me which cluster outcome you want in text/table form. From the dendrogram you decide of where you want to cut it, i.e., the number of groups you want and cut the dendrogram with cutree, e.g. cl.4 <- cutree(clusterward1, k = 4). The result cl.4 is a vector with the cluster membership for each sequence and you get the list of the members of group 1, for example, with rownames(mvad.seq)[cl.4==1].
Alternatively, you can use the identify method (see ?identify.hclust) to select the groups interactively from the plot, but need to pass the argument as as.hclust(cward). Here is the code for the example
## plot the dendrogram
plot(cward, which.plot = 2, labels=FALSE)
## and select the groups manually from the plot
x <- identify(as.hclust(cward)) ## Terminate with second mouse button
## number of groups selected
length(x)
## list of members of the first group
x[[1]]
Hope this helps.

Resources