Create a hierarchical clustering dendrogram for integrated Seurat object in R? - r

Does anybody know how to create a dendrogram for an integrated Seurat object. I can do it for a non-integrated object, but when I try:
immune.combined <- BuildClusterTree(object = immune.combined, slot = "data")
I see the error:
Error in hclust(d = data.dist) : NA/NaN/Inf in foreign function call (arg 10)

If you followed the normal Seurat workflow, at some point you will have changed the default assay to "RNA". Looking at the source for BuildClusterTree, it uses the most variable features from the chosen assay (var.features in the Large Seurat object under your chosen assay). For the integrated workflow, you only calculated these values for the "integrated" assay, not the RNA assay. You therefore need to do the analysis on the integrated assay. That would imply something like this:
sampleIntegrated <- BuildClusterTree(sampleIntegrated,assay="integrated")
For some reason that does not work, and the same error is produced. If you first explicitly set the default assay to integrated, however, it works:
DefaultAssay(sampleIntegrated) <- "integrated"
sampleIntegrated <- BuildClusterTree(sampleIntegrated,assay="integrated")
You can then use your visualization method of choice. For example, using the ggtree package and Tool from Seurat:
library(ggtree)
myPhyTree <- Tool(object=sampleIntegrated, slot = "BuildClusterTree")
ggtree(myPhyTree)+geom_tiplab()+theme_tree()+xlim(NA,400)

Related

R implementation of kohonen SOMs: prediction error due to data type.

I have been trying to run an example code for supervised kohonen SOMs from https://clarkdatalabs.github.io/soms/SOM_NBA . When I tried to predict test set data I got the following error:
pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing)
Error in FUN(X[[i]], ...) :
Data type not allowed: should be a matrix or a factor
I tried newdata = as.matrix(NBA.testing) but it did not help. Neither did as.factor().
Why does it happen? And how can I fix that?
You should put one more argument to the predict function, i.e. "whatmap", then set its value to 1.
The code would be like:
pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing, whatmap = 1)
To verify the prediction result, you can check using:
table(NBA$Pos[-training_indices], pos.prediction$predictions[[2]], useNA = 'always')
The result may be different from that of the tutorial, since it did not declare the use of set.seed() function.
I suggest that the set.seed() with an arbitrary number in it was declared somewhere before the training phase.
For simplicity, put it once on the top most of your script, e.g.
set.seed(12345)
This will guarantee a reproducible result of your model next time you re-run your script.
Hope that will help.

Using mRMRe in R

I am currently working on a project where I have to do some feature selection for building a predictive model. I was lead to a package in R called mRMRe. I am just trying to work the example but cannot get it working. The example can be found here - http://www.inside-r.org/packages/cran/mRMRe/docs/mRMR.ensemble.
Here is my code -
data(cgps)
data <- data.frame(target=cgps.ic50, cgps.ge)
mRMR.ensemble(data, 1, rep.int(1, 30))
When I run this code I get the error -
Error in .local(.Object, ...) : data must be of type mRMRe.Data.
I dug a litter further and found that you actually have to convert the data to mRMR.Data type. So I did this update -
# Update
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data, 1, rep.int(1, 30))
but I still get the same error. When I look at the class I have -
> class(data)
[1] "mRMRe.Data"
attr(,"package")
[1] "mRMRe"
So the data is the requested type but the code is still not functional.
My question is if anyone has experience using this package or any help or comments would be appreciated!
Also want to note that in the example from the link - when I load the data
cgps_ic50 -> cgps.ic50
cgps_ge -> cgps.ge
so the names of the data aren't the same as the same in the example.
With the code you wrote:
data(cgps)
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data, 1, rep.int(1, 30))
The function mRMR.ensemble is getting the data as the first parameter, but the default first parameter in this function is solution_count.
I understand that your intentions executing that example are finding 30 relevant and non-redundant features using the classic mRMR feature selection algorithm so try this:
data(cgps)
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data = data, target_indices = 1,
feature_count = 30, solution_count = 1)
The target_indices are the positions in the original data.frame of the features used to maximize the relevance (correlation or other quality measure for this issue), so features selected in the end will be good for explaining the features indicated in the target_indices.
For example, in a classification problem, we would choose the position of the class variable as the value for the target_indices parameter.
The feature_count parameter indicates the number of variables to be chosen.
The solution_count is not a parameter of the classic mRMR. It indicates the number of mRMR algorithms to be ensembled to get a final feature selection, so if set to 1 it performs only one classic mRMR.

Univariate feature selection in caret

I would like to select features based on anovaScores in caret. I can get the scores by scores <- apply(train_data, 2, anovaScores, train_data$target) and then sort features and select n best ones, but I don't know how to do it with sbfControl. In documentation to anovaScores is written: "The functions described here are passed to the algorithm via the functions argument of sbfControl."
Doing
featSel_ctrl <- sbfControl(functions = anovaScores)
featSel <- sbf(target ~., data=train_data, sbfControl = featSel_ctrl)
doesn't work. Will produce 'object of type 'closure' is not subsettable' error.
functions has other elements that you are excluding. See the documentation that has some details. If you are doing classification, anovaScores is already being used.

Multichannel sequence analysis through WeightedCluster package

I would like to apply the functions available in the WeightedCluster package to analyze multichannel sequences I obtained through TraMineR. I am trying so, but due to the fact that multichannel sequences are lists composed by each channel separatedly, I get errors in functions like seqtreedisplay() and all those which require a sequence object.
This is an example:
fullsequences <- list(
work_sequence2 = work_sequence[which(rownames(work_sequence) %in% commonid),],
educ_sequence2 = educ_sequence[which(rownames(educ_sequence) %in% commonid),],
part_sequence2 = part_sequence[which(rownames(part_sequence) %in% commonid),],
kid_sequence2 = kid_sequence[which(rownames(kid_sequence) %in% commonid),]
) # a total of 926 with complete sequences on all channels
multidist <- seqdistmc(
channels = fullsequences,
method = "OM",
norm = FALSE,
sm = list("TRATE","TRATE","TRATE","TRATE"),
with.missing=FALSE,
full.matrix=TRUE,
link="sum")
clusterward <- hclust(as.dist(multidist), method = "ward")
seqtreedisplay(as.seqtree(clusterward, ncluster = 5,
seqdata = fullsequences , diss = multidist))
Error in seqlegend(seqdata, fontsize = legend.fontsize, title = "Legend", :
data is not a sequence object, use seqdef function to create one
Is there a method to use the functionalities of WeightedCluster package upon a multichannel-type object (list of sequences). I am specially interested in using the Partition Around Medioids algorithm with initial ward clusters (function wcKMedioids()). If it is not possible, which is the best alternative to cluster multichannels in R?
Thanks a lot in advance!
The as.seqtree function (from WeightedCluster) requires an object of class stslist (as produced by the TraMineR seqdef function) as seqdata argument. In your case, fullsequences is a list of such objects (the list of parallel sequences), which is NOT itself of class stslist. This causes the error.
Even if you would be able to define a tree of parallel sequences, the problem would be that the seqtreedisplay does not know how to plot parallel sequences. This means that you would have to define a plot function for a list of state sequences and, using the more general disstreedisplay function instead of seqtreedisplay, pass the plot function as imagefunc argument.
To summarize, there are two problems. First you need some as.disstree equivalent of as.seqtree that would work for hierarchical clustering of non-stslist objects. Second, you need a plot function for parallel sequences. The first problem is purely technical and should be easily solved. The second is more conceptual.

R: finding the source code that produces the output for S4 slot?

G'day Everyone,
When the 'lmer' function in 'lme4' runs its produces an S4 object with a lot of slots. I am interested in one of these slots, namely model#X, and how this 'X' slot output is produced. I want to try reproduce this output for a different model function (glmmPQL) I am using which does not automatically produces this 'X' output (FYI 'lmer' produces an object of class 'mer', and slot 'X' is a model matrix for the fixed effects). Code below shows what I am talking about.
What I want to figure out is how the produced this 'X' data? I looked at the code for 'lmer' by writing it in the terminal without '()' but I couldn't find anything there. I also tried showMethod('lmer') but it says function 'lmer': .
Just wondering if there is a way to get the source code for what the 'X' slot is doing in particular (or any slot in a S4 object)? Or does anyone know how to reproduce this? Thanks lots for your help and time.
library(lme4)
# here is a quick example of what I am looking at using the cake dataset in the 'lme4' package
m <- lmer(cakeglmm<- lmer(angle ~ temp + recipe + (1| replicate), family = gaussian, data = cake)
slotNames(m)
head(m#X)
You started off okay by printing lmer. That won't show you where m#X is set, but you can see which methods are called by lmer.
The methods within lmer can be accessed using lme4:::methodName.
If you look inside lme4:::lmer_finalize, you'll see (paraphrasing):
ans <- new(Class = "mer", ..., X = fr$X, ...)
So that's where the #X slot is being populated. Back up in lmer you'll see that fr comes from lme4:::lmerFrames, and specifically fr$X is calculated by:
X <- if (!is.empty.model(mt))
model.matrix(mt, mf, contrasts)
else matrix(, NROW(Y), 0)

Resources