New gplots update in R cannot find function distfun in heatmap.2 - r

I have a bit of R-code to make a heatmap from a correlation matrix, which worked the last time I used it (prior to the 2013 Oct 17 update of gplots; after updating to R Version 3.0.2). This makes me think that something changed in the most recent gplots update, but I can not figure out what.
What used to present a nice plot now gives me this error:
" Error in hclustfun(distfun(x)) : could not find function "distfun" "
and won't plot anything. Below is the code to reproduce the plot (heavily commented as I was using it to teach an undergrad how to use heatmaps for a project). I tried adding the last line to explicitly set the functions, but it didn't help resolve the problem.
EDIT: I changed the last line of code to read:
,distfun =function(c) {as.dist(1-c,upper=FALSE)}, hclustfun=hclust)
and it worked. When I used just "dist=as.dist" I got a plot, but it wasn't sorted right, and several of the dendrogram branches didn't connect to the tree. Not sure what happened, or why this is working, but it appears to be.
Any help would be greatly appreciated.
Thanks in advance,
library(gplots)
set.seed(12345)
randData <- as.data.frame(matrix(rnorm(600),ncol=6))
randDataCorrs <- randData+(rnorm(600))
names(randDataCorrs) <- paste(names(randDataCorrs),"_c",sep="")
randDataExtra <- cbind(randData,randDataCorrs)
randDataExtraMatrix <- cor(randDataExtra)
heatmap.2(randDataExtraMatrix, # sets the correlation matrix to use
symm=TRUE, #tells whether it is symmetrical or not
main= "Correlation matrix\nof Random Data Cor", # Names plot
xlab= "x groups",ylab="", # Sets the x and y labels
scale="none", # Tells it not to scale the data
col=redblue(256),# Sets the colors (can be manual, see below)
trace="none", # tells it not to add a trace
symkey=TRUE,symbreaks=TRUE, # Tells it to keep things symmetric around 0
density.info = "none"#) # can be "histogram" if you want a hist of your corr values here
#,distfun=dist, hclustfun=hclust)
,distfun =function(c) {as.dist(1-c,upper=FALSE)}, hclustfun=hclust) # new last line

I had the same error, then I noticed that I had made a variable called dist, which is the default call for distfun= dist. I renamed the variable and then everything ran fine. You likely made the same error, as your new code is working since you have altered the default call of distfun.

Related

Plot not showing in Julia

I have a file named mycode.jl with following code taken from here.
using MultivariateStats, RDatasets, Plots
# load iris dataset
println("loading iris dataset:")
iris = dataset("datasets", "iris")
println(iris)
println("loaded; splitting dataset: ")
# split half to training set
Xtr = Matrix(iris[1:2:end,1:4])'
Xtr_labels = Vector(iris[1:2:end,5])
# split other half to testing set
Xte = Matrix(iris[2:2:end,1:4])'
Xte_labels = Vector(iris[2:2:end,5])
print("split; Performing PCA: ")
# Suppose Xtr and Xte are training and testing data matrix, with each observation in a column. We train a PCA model, allowing up to 3 dimensions:
M = fit(PCA, Xtr; maxoutdim=3)
println(M)
# Then, apply PCA model to the testing set
Yte = predict(M, Xte)
println(Yte)
# And, reconstruct testing observations (approximately) to the original space
Xr = reconstruct(M, Yte)
println(Xr)
# Now, we group results by testing set labels for color coding and visualize first 3 principal components in 3D plot
println("Plotting fn:")
setosa = Yte[:,Xte_labels.=="setosa"]
versicolor = Yte[:,Xte_labels.=="versicolor"]
virginica = Yte[:,Xte_labels.=="virginica"]
p = scatter(setosa[1,:],setosa[2,:],setosa[3,:],marker=:circle,linewidth=0)
scatter!(versicolor[1,:],versicolor[2,:],versicolor[3,:],marker=:circle,linewidth=0)
scatter!(virginica[1,:],virginica[2,:],virginica[3,:],marker=:circle,linewidth=0)
plot!(p,xlabel="PC1",ylabel="PC2",zlabel="PC3")
println("Reached end of program.")
I run above code with command on Linux terminal: julia mycode.jl
The code runs all right and reaches the end but the plot does not appear.
Where is the problem and how can it be solved.
As the Output section of the Plots docs says:
A Plot is only displayed when returned (a semicolon will suppress the return), or if explicitly displayed with display(plt), gui(), or by adding show = true to your plot command.
You can have MATLAB-like interactive behavior by setting the default value: default(show = true)
The first part about "when returned" is about when you call plot from the REPL (or Jupyter, etc.), and doesn't apply here.
Here, you can use one of the other options:
calling display(p) after the last plot! call (this is the most common way to do it)
calling gui() after the last plot!
adding a show = true argument to the last plot! call
setting the default to always show the plot by setting Plots.default(show = true) at the beginning of the script
Any one of these is sufficient to make the plot window appear.
The plot closes when the Julia process ends, if that's happening too soon, you can either:
Run your code as julia -i mycode.jl at the terminal - this will run your code, display the plot, and then land you at the Julia REPL. This will both keep the plot open, and let you work with the variables in your code further if you need to.
add a readline() call at the end of your program. This will keep Julia waiting for an extra press of newline/Enter/Return key, and the plot will remain in display until you press that.
(Credit to ffevotte on Julia Discourse for these suggestions.)

FactorMiner plot.HCPC function for cluster labeling

This is the function that is part of FactorMiner package
https://github.com/cran/FactoMineR/blob/master/R/plot.HCPC.R
As an example this is the code I ran
res.pca <- PCA(iris[, -5], scale = TRUE)
hc <- HCPC(res.pca, nb.clust=-1,)
plot.HCPC(hc, choice="3D.map", angle=60)
hc$call$X$clust <- factor(hc$call$X$clust, levels = unique(hc$call$X$clust))
plot(hc, choice="map")
The difference is when i run this hc$call$X$clust <- factor(hc$call$X$clust, levels = unique(hc$call$X$clust))
before plot.HCPC this doesn't change the annotation in the figure but when I do the same thing before I ran this plot(hc, choice="map") it is reflected in the final output.
When i see the plot.HCPC function this is the line of the code that does embed the cluster info into the figure
for(i in 1:nb.clust) leg=c(leg, paste("cluster",levs[i]," ", sep=" "))
legend("topleft", leg, text.col=as.numeric(levels(X$clust)),cex=0.8)
My question I have worked with small function where I understand when i edit or modify which one goes where and does what here in this case its a complicated function at least to me so Im not sure how do I modify that part and get what I would like to see.
I would like to see in case of my 3D dendrogram each of the cluster are labelled with group the way we can do in complexheatmap where we can annotate that are in row or column with a color code so it wont matter what the order in the data-frame we can still identify(it's just visual thing I know but I would like to learn how to modify these)

Label outliers using mvOutlier from MVN in R

I'm trying to label outliers on a Chi-square Q-Q plot using mvOutlier() function of the MVN package in R.
I have managed to identify the outliers by their labels and get their x-coordinates. I tried placing the former on the plot using text(), but the x- and y-coordinates seem to be flipped.
Building on an example from the documentation:
library(MVN)
data(iris)
versicolor <- iris[51:100, 1:3]
# Mahalanobis distance
result <- mvOutlier(versicolor, qqplot = TRUE, method = "quan")
labelsO<-rownames(result$outlier)[result$outlier[,2]==TRUE]
xcoord<-result$outlier[result$outlier[,2]==TRUE,1]
text(xcoord,label=labelsO)
This produces the following:
I also tried text(x = xcoord, y = xcoord,label = labelsO), which is fine when the points are near the y = x line, but might fail when normality is not satisfied (and the points deviate from this line).
Can someone suggest how to access the Chi-square quantiles or why the x-coordinate of the text() function doesn't seem to obey the input parameters.
Looking inside the mvOutlier function, it looks like it doesn't save the chi-squared values. Right now your text code is treating xcoord as a y-value, and assumes that the actual x value is 1:2. Thankfully the chi-squared value is a fairly simple calculation, as it is rank-based in this case.
result <- mvOutlier(versicolor, qqplot = TRUE, method = "quan")
labelsO<-rownames(result$outlier)[result$outlier[,2]==TRUE]
xcoord<-result$outlier[result$outlier[,2]==TRUE,1]
#recalculate chi-squared values for ranks 50 and 49 (i.e., p=(size:(size-n.outliers + 1))-0.5)/size and df = n.variables = 3
chis = qchisq(((50:49)-0.5)/50,3)
text(xcoord,chis,label=labelsO)
As it is mentioned in the previous reply, MVN packge does not support to label outliers. Although it is not really necessary since it can be done manually, we still might consider to add "labeling outliers" option within mvOutlier(...) function. Thanks for your interest indeed. We might include it in the following updates of the package.
The web-based version of the MVN package has now ability to label outliers (Advanced options under Outlier detection tab). You can access this web-tool through http://www.biosoft.hacettepe.edu.tr/MVN/

Bandwidth when plotting densities in R

When I plot the density for wind direction using circular package, I get an error. The error is shown below. Can someone explain the bw (bandwidth) that I need for the amount of data?
plot(density(dirCir))
Error in density.circular(dirCir) :
argument "bw" is missing, with no default
This is the actual code that I have.
library (circular)
dir <-c(308,351,330,16,3,346,345,345,287,359,345,358,336,335,346,16,325,354,5,354,322,340,6,278,354,343,261,353,288,8)
dirCir <- circular(dir, units ="degrees", template = "geographics")
mean(dirCir)
var(dirCir)
summary(dirCir)
plot(dirCir)
plot(density(dirCir))
rose.diag(dirCir, main = 'dir Data')
points(dirCir)
As #eipi10 says, bw has to be explicitly chosen. Depending on the kernel that you choose large and small values of this bandwidth parameter may produce spiky density estimates as well as very smooth ones.
Common practice is to try several values and choose the one that seems to describe the data best. However, note that the following functions provide more objective ways of selecting the bw:
# bw.cv.mse.circular(dirCir)
[1] 21.32236
# bw.cv.mse.circular(dirCir, kernel = "wrappednormal")
[1] 16.97266
# bw.cv.ml.circular(dirCir)
[1] 19.71197
# bw.cv.ml.circular(dirCir, kernel = "wrappednormal")
[1] 0.2280636
# bw.nrd.circular(dirCir)
[1] 14.63382
When you run density on an object of class circular, it appears that you have to include a value for bw (bandwidth) explicitly (as the error message indicates). Try this:
plot(density(dirCir, kernel="wrappednormal", bw=0.02), ylim=c(-1,5))
See below for the graph. The ylim range is so that the plot fits inside the plot area without clipping. See the help for density.circular for more info on running the density function on circular objects.

Trying to determine why my heatmap made using heatmap.2 and using breaks in R is not symmetrical

I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. My matrix is symmetrical.
Here is a copy of the data-set I am using after it is run through pearson:DataSet
Here is the complete process that I am following to generate these graphs: Generate a distance matrix using some correlation in my case pearson, then take that matrix and pass it to R and run the following code on it:
library(RColorBrewer);
library(gplots);
library(MASS);
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
# location <- args[2];
# setwd(args[2]);
pdf("result.pdf", pointsize = 15, width = 18, height = 18)
mycol <- c("blue","white","red")
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
#colors <- colorpanel(75,"midnightblue","mediumseagreen","yellow")
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
dev.off()
The issue I am having is once I use breaks to help me control the color separation the heatmap no longer looks symmetrical.
Here is the heatmap before I use breaks, as you can see the heatmap looks symmetrical:
Here is the heatmap when breaks are used:
I have played with the cutoff's for the sequences to make sure for instance one sequence does not end exactly where the other begins, but I am not able to solve this problem. I would like to use the breaks to help bring out the clusters more.
Here is an example of what it should look like, this image was made using cluster maker:
I don't expect it to look identical to that, but I would like it if my heatmap is more symmetrical and I had better definition in terms of the clusters. The image was created using the same data.
After some investigating I noticed was that after running my matrix through heatmap, or heatmap.2 the values were changing, for example the interaction taken from the provided data set of
Pacdh-2
and
pegg-2
gave a value of 0.0250313 before the matrix was sent to heatmap.
After that I looked at the matrix values using result$carpet and the values were then
-0.224333135
-1.09805379
for the two interactions
So then I decided to reorder the original matrix based on the dendrogram from the clustered matrix so that I was sure that the values would be the same. I used the following stack overflow question for help:
Order of rows in heatmap?
Here is the code used for that:
rowInd <- rev(order.dendrogram(result$rowDendrogram))
colInd <- rowInd
data_ordered <- matrix_a[rowInd, colInd]
I then used another program "matrix2png" to draw the heatmap:
I still have to play around with the colors but at least now the heatmap is symmetrical and clustered.
Looking into it even more the issue seems to be that I was running scale(matrix_a) when I change my code to just be mtscaled <- as.matrix(matrix_a) the result now looks symmetrical.
I'm certainly not the person to attempt reproducing and testing this from that strange data object without code that would read it properly, but here's an idea:
..., col=bluered(20)[4:20], ...
Here's another though which should return the full rand of red which tha above strategy would not:
shift.BR<- colorRamp(c("blue","white", "red"), bias=0.5 )((1:16)/16)
heatmap.2( ...., col=rgb(shift.BR, maxColorValue=255), .... )
Or you can use this vector:
> rgb(shift.BR, maxColorValue=255)
[1] "#1616FF" "#2D2DFF" "#4343FF" "#5A5AFF" "#7070FF" "#8787FF" "#9D9DFF" "#B4B4FF" "#CACAFF" "#E1E1FF" "#F7F7FF"
[12] "#FFD9D9" "#FFA3A3" "#FF6C6C" "#FF3636" "#FF0000"
There was a somewhat similar question (also today) that was asking for a blue to red solution for a set of values from -1 to 3 with white at the center. This it the code and output for that question:
test <- seq(-1,3, len=20)
shift.BR <- colorRamp(c("blue","white", "red"), bias=2)((1:20)/20)
tpal <- rgb(shift.BR, maxColorValue=255)
barplot(test,col = tpal)
(But that would seem to be the wrong direction for the bias in your situation.)

Resources