How to draw line around significant values in R's corrplot package - r

I have been asked to obtain a correlation plot for a colaborator.
My choice is to use R for the task, specifically the corrplot package.
I have been researching on the internet and I found multiple ways to obtain such graphics, but not the specific graphic I was asked for (as you can see in the picture the significant values are highlighted by drawing a square around the significant tile), which is puzzling me.
Example of the correlation plot required
The closest result I achieve is using the code under this lines, but I do not seem to be able to find the option to draw line around the significant tiles (if exists).
#Insignificant correlations are leaved blank
corrplot(res3$r, type="upper", order="hclust",
p.mat = res3$P, sig.level = 0.01, insig = "blank")
I tried adding the "addrect" parameter but it didn't work.
#Insignificant correlation are crossed
corrplot(res3$r, type="upper", order="hclust", p.mat = res3$P,
addrect=2, sig.level = 0.01, insig = "blank")
Any help will be appreciated.

corrplot allows you to add new plots to an already existing one. Therefore, once you've created the plot of the initial correlation matrix, you can simply add those cells that you want to highlight in an iterative manner using corrplot(..., add = TRUE).
The only thing required to achieve your goal is an indices vecor (which I called 'ids') to tell R which cells to highlight. Note that for reasons of simplicity, I took a random sample of the initial correlation matrix, but things like ids <- which(p.value < 0.01) (assuming that you've stored your significance levels in a separate vector) would work similarly.
library(corrplot)
## create and visualize correlation matrix
data(mtcars)
M <- cor(mtcars)
corrplot(M, cl.pos = "n", na.label = " ")
## select cells to highlight (e.g., statistically significant values)
set.seed(10)
ids <- sample(1:length(M), 15L)
## duplicate correlation matrix and reject all irrelevant values
N <- M
N[-ids] <- NA
## add significant cells to the initial corrplot iteratively
for (i in ids) {
O <- N
O[-i] <- NA
corrplot(O, cl.pos = "n", na.label = " ", addgrid.col = "black", add = TRUE,
bg = "transparent", tl.col = "transparent")
}
Note that you could also add all values to highlight in one go (i.e., without requiring a for loop) using corrplot(N, ...), but in that case, an undesirable black margin is drawn all around the plotting area.

Related

R - Meta-Analysis - Plotting forest plot from multi-level random-effects model with subgroups

I am having trouble with plotting a forest plot based on a multi-level model, in which I'd also like to display pooled effects of subgroups, as well as the results for subgroup differences.
So far, I have managed to produce a plot of the data where clusters are grouped together. I would like to extend this plot by adding pooled effects of subgroups at the right positions, without losing the grouping of the clusters. (As it is explained here, but also while keeping what is shown in the last example of this).
This is the code I have used so far to produce the "normal" forest plot for my model (sorry, it's pretty long):
# ma_data => my data
# main_3L => my multi-level model
# Prepare row argument for separation by study
dd <- c(0, diff(ma_data$ID))
dd[dd > 0] <- 1
rows <- (1:main_3L$k) + cumsum(dd)
par(tck=-.01, mgp = c(1.6,.2,0), cex=1)
# refactor ID var
ma_data$ID_plot <- substr(ma_data$short_cite, 1, nchar(ma_data$short_cite))
ma_data$ID_plot <- paste(sub(" ||) ","",substr(ma_data$ID_plot,0,2)), substr(ma_data$ID_plot,3,nchar(ma_data$ID_plot)), sep="")
tiff("./figures/forestFull_ext1.tiff", width=3200,height=4500, res=300)
# Plot the forest!
metafor::forest(main_3L,
addpred = TRUE, # adds prediction interval
cex=0.5,
header="Author(s) and Year",
rows=rows, # uses the vector created above
order=order(ma_data$ID, ma_data$es_adj),
ylim=c(0.5,max(rows)+3),
xlim=c(-5,3),
xlab="Hedges' G",
ilab=cbind(as.character(ma_data$setup),as.character(ma_data$target_1), as.character(ma_data$measure_type), ma_data$task, as.character(ma_data$cogdom_pooled), ma_data$sample_size_exp),
ilab.xpos=c(-3.9,-3.6,-3.3,-2.8,-2.2,-1.7),
slab=ma_data$ID_plot,
mlab = mlabfun("Overall RE Modell", main_3L, main_3L.I2)) # Adds Q,Qp, I² and sigma² values.
abline(h = rows[c(1,diff(rows)) == 2] - 1, lty="dotted")
# adds a second polygon with robust estimates for standard error
addpoly(coeftest.main_3L$beta, sei = coeftest.main_3L$SE,
rows = -2.5,
cex = 0.5,
mlab = "Robust RE Model estimate",
col = "darkred")
par(cex=0.5, font=2)
# text(c(-4,-3.7,-3.2,-2.5, -2), 150.5, pos=3, c("Target", "Measure","Task","Cognitive Domain", "N"))
text(c(-3.9,-3.6,-3.3,-2.8,-2.2,-1.7), 150.5, pos=3, c("Setup", "Target", "Measure","Task","Cognitive Domain", "N"))
dev.off()
Specifically, I need to know how to "make space" for the additional rows and polygons.
Also, is there an option in the forest() function to display only the pooled effects of subgroups and main effect, bot not the individual effect sizes? I know that it is possible in the meta package, but have not found anything similar in metafor.
Any help is greatly appreciated!

How to color the background of a corrplot by group?

Consider this data, where we have several groups with 10 observations each, and we conduct a pairwise.t.test():
set.seed(123)
data <- data.frame(group = rep(letters[1:18], each = 10),
var = rnorm(180, mean = 2, sd = 5))
ttres <- pairwise.t.test(x=data$var, g=data$group, p.adjust.method = "none")#just to make sure i get some sigs for the example
Now lets get the matrix of p values, convert them to a binary matrix showing significant and non-significant values, and plot them with corrplot(), so that we can visualize which groups are different:
library(corrplot)
pmat <- as.matrix(ttres$p.value)
pmat<-round(pmat,2)
pmat <- +(pmat <= 0.1)
pmat
corrplot(pmat, insig = "blank", type = "lower")
Does anyone know a way to color the background of each square according to a grouping label? For instance, say we want the squares for groups a:g to be yellow, the squares for groups h:n to be blue, and the squares for groups o:r to be red. Or is there an alternative way to do this with ggplot?
You can pass a vector of background colors via the bg= parameter. The trick is just making sure they are in the right order. Here's on way to do that
bgcolors <- matrix("white", nrow(pmat), ncol(pmat),dimnames = dimnames(pmat))
bgcolors[1:6, ] <- "yellow"
bgcolors[7:15, ] <- "blue"
bgcolors[14:17, ] <- "red"
bgcolors <- bgcolors[lower.tri(bgcolors, diag=TRUE)]
corrplot(pmat, insig = "blank", type = "lower", bg=bgcolors)
Basically we just make a matrix the same shape as our input, then we set the colors we want for the different rows, and then we just pass the lower triangle of that matrix to the function.

R corrplot - color relying on value

I have a binary data.frame (53115 rows; 520 columns) and I want to plot a correlation plot. I want to colour it based on the values, correlation values >=0.95 (red), otherwise, blue.
correl <- abs(round(cor(bin_mat), 2))
pdf("corrplot.pdf", width = 200, height = 200)
a <- corrplot(correl, order = "hclust", addCoef.col = "black", number.cex=0.8, cl.lim = c(0,1), col=c(rep("deepskyblue",19) ,"red"))
dev.off()
I get the correlation plot but in many cases I get a wrong coloring (see below on 0.91).
data: file
How can I manage to have a right coloring?
In general corrplot library is quite weird when it comes to cl.lim and colors. For some reason it doesn't seem to matter if you set cl.lim or not - the colors will still be distributed from -1 to 1.
So in your case just use 39 blue colors instead of 19 (to cover the range from -1 to 1):
cors <- cor(iris[,-5])
cors[cbind(c(1,2), c(2,1))] <- 0.912
corrplot(cors, col=c(rep("blue", 39), "red"), cl.lim=c(-1,1), addCoef.col="black")
And the result:

How to plot an nmds with coloured/symbol points based on SIMPROF

Hi so i am trying to plot my nmds of a assemblage data which is in a bray-curtis dissimilarity matrix in R. I have been able to apply ordielipse(),ordihull() and even change the colours based on group factors created by cutree() of a hclst()
e.g using the dune data from the vegan package
data(dune)
Dune.dis <- vegdist(Dune, method = "bray)
Dune.mds <- metaMDS(Dune, distance = "bray", k=2)
#hierarchical cluster
clua <- hclust(Dune.dis, "average")
plot(clua, hang = -1)
# set groupings
rect.hclust(clua, 4)
grp <- cutree(clua, 4)
#plot mds
plot(Dune.mds, display = "sites", type = "text", cex = 1.5)
#show groupings
ordielipse(Dune.mds, group = grp, border =1, col ="red", lwd = 3)
or even colour the points just by the cutree
colvec <- c("red2", "cyan", "deeppink3", "green3")
colvec[grp]
plot(Dune.mds, display = "sites", type = "text", cex = 1.5) #or use type = "points"
points(P4.mds, col = colvec[c2], bg =colvec[c2], pch=21)
However what i really want to do is use the SIMPROF function using the package "clustsig" to then colour the points based on significant groupings - this is more of a technical coding language thing - i am sure there is a way to create a string of factors but i am sure there is a more efficient way to do it
heres my code so far for that:
simp <- simprof(Dune.dis, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "braycurtis", alpha = 0.05, sample.orientation = "row")
#plot dendrogram
simprof.plot(simp, plot = TRUE)
Now i am just not sure how do the next step to plot the nmds using the groupings defined by the SIMPROF - how do i make the SIMPROF results a factor string without literally typing it my self it myself?
Thanks in advance.
You wrote you know how to get colours from an hclust object with cutree. Then read the documentation of clustsig::simprof. This says that simprof returns an hclust object within its result object. It also returns numgroups which is the suggested number of clusters. Now you have all information you need to use the cutree of hclust you already know. If your simprof result is called simp, use cutree(simp$hclust, simp$numgroups) to extract the integer vector corresponding to the clustsig::simprof result, and use this to colours.
I have never used simprof or clustsig, but I gathered all this information from its documentation.

R Correlation Plots number instead of Text

When i plot the correlation, the column names are not displayed instead number gets displayed.
Why does this happen and how to rectify the same?
Below is the code
espAlltmNum <- espAlltm[, sapply(espAlltm, is.numeric)]
#above dataset is created as correlation needs only numeric columns
M <- cor(espAlltmNum,use = "pairwise", method = "pearson")
corrplot(M, method = "circle",tl.pos = "d", tl.cex = 0.5, tl.col = 'black',
order = "hclust", diag = TRUE,title = "Correlation Plot"
, mar=c(1,1,1,1))
the output is:
i see some issue with corrplot and GGally packages. If the correlation matrix is called before the GGally package/library, the matrix contains the column names(in text).
If the correlation matrix is called after the GGally package/library, matrix contains the index number of the column name. The plot too would give the index number as attached before...

Resources