Heatplot/enrichGO, fold change display & order of genes - r

Hello I'm working on the heatplot function offered in enrichPlot GO over-representation test (in R).
I was wondering if I could get some help with the following issues:
In what format should the 'geneList' in the 'foldChange=geneList' specification be? I can only get black and white results & not the colour scaled ones -When I tried to add the list of my DE genes (in my case 'fpkm_allGDE_up_345$Ens_ID_merge') I was getting the message:'Error: Discrete value supplied to continuous scale'.
It only worked when I set 'foldChange=fpkm_allGDE_up$FOLD_CHANGE' but then with no colorscale.
Is it possible to rank the genes on the x axis so as that genes associated with more GO terms (of the ones indicated) are first?
Here's what I tried:
egoall_up <- enrichGO(gene = fpkm_allGDE_up_345$Ens_ID_merge, #DEgenes_posttreatment
universe = fpkm_allG_345$Ens_ID_merge, #all genes expressed
keyType = "ENSEMBL",
OrgDb = org.Hs.eg.db,
ont = "ALL",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable = TRUE)
heatplot(egoall_up, foldChange=fpkm_allGDE_up$FOLD_CHANGE, showCategory=10)
enter image description here
head(fpkm_allGDE_up)
enter image description here

Related

Select argument doesn't work on cca objects

I created an object of class cca in vegan and now I am trying to tidy up the triplot. However, I seemingly can't use the select argument to only show specified items.
My code looks like this:
data("varechem")
data("varespec")
ord <- cca(varespec ~ Al+S, varechem)
plot(ord, type = "n")
text(ord, display = "sites", select = c("18", "21"))
I want only the two specified sites (18 and 21) to appear in the plot, but when I run the code nothing happens. I do not even get an error meassage.
I'm really stuck, but I am fairly certain that this bit of code is correct. Can someone help me?
I can't recall now, but I don't think the intention was to allow "names" to select which rows of the scores should be selected. The documentation speaks of select being a logical vector, or indices of the scores to be selected. By indices it was meant numeric indices, not rownames.
The example fails because select is also used to subset the labels character vector of values to be plotted in text(), and this labels character vector is not named. Using a character vector to subset another vector requires that the other vector be named.
Your example works if you do:
data("varechem")
data("varespec")
ord <- cca(varespec ~ Al + S, varechem)
plot(ord, type = "n")
take <- which(rownames(varechem) %in% c("18", "21"))
# or
# take <- rownames(varechem) %in% c("18", "21")
text(ord, display = "sites", select = take)
I'll have a think about whether it will be simple to support the use case of your example.
The following code probably gives the result you want to achieve:
First, create an object to store the blank CCA1-CCA2 plot
p1 = plot(ord, type = "n")
Find and then save the coordinates of the sites 18 and 21
p1$p1$sites[c("18", "21"),]
# CCA1 CCA2
#18 0.3496725 -1.334061
#21 -0.8617759 -1.588855
site18 = p1$sites["18",]
site21 = p1$sites["21",]
Overlay the blank CCA1-CCA2 plot with the points of site 18 and 21. Setting different colors to different points might be a good idea.
points(p1$sites[c("18", "21"),], pch = 19, col = c("blue", "red"))
Showing labels might be informative.
text(x = site18[1], y = site18[2] + 0.3, labels = "site 18")
text(x = site21[1], y = site21[2] + 0.3, labels = "site 21")
Here is the resulted plot.

Change name of groups in bal.plot

I am trying to visualize results from MatchIt procedure with bal.plot() from cobalt package.
It works just fine, except I would like to change the lables for the group which by default are "Unadjusted sample" and "Adjusted sample".
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS", which = "both",
type = "histogram", mirror = F,
weights = AHEAD_nomiss$att.weights, treat = AHEAD_nomiss$group)
Author of cobalt package here. Thank you for using my package!
Edit. Original post at the bottom.
I just added some functionality to bal.plot for this in the development version of cobalt, which can be installed with devtools::install_github("ngreifer/cobalt"). Use the sample.names argument to supply a vector of names to give bal.plot and they'll appear in the facet labels. The vector should be as long as the number of samples (in your case, 2). Your new code should look like this:
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS", which = "both",
type = "histogram", mirror = F,
weights = AHEAD_nomiss$att.weights, treat = AHEAD_nomiss$group,
sample.names = c("UNWEIGHTED", "WEIGHTED"))
Of course you can change the names. If you don't want to install the development version of cobalt (it't not guaranteed to be stable), you can use my solutions below.
I didn't intend bal.plot to be used for publication so I didn't make it super flexible, unlike love.plot. One thing you can do is manually program the histograms using ggplot2. Of course, this requires you learning how to use ggplot2, which can be a challenge, and looking at the source code of bal.plot probably won't help because of all the checks and transformations that occur. Here's some code that might work for you:
unweighted <- data.frame(KCH_TKS = AHEAD_nomiss$KCH_TKS,
treat = factor(AHEAD_nomiss$group),
weights = 1,
adj = "UNWEIGHTED",
stringsAsFactors = FALSE)
weighted <- data.frame(KCH_TKS = AHEAD_nomiss$KCH_TKS,
treat = factor(AHEAD_nomiss$group),
weights = AHEAD_nomiss$att.weights,
adj = "WEIGHTED",
stringsAsFactors = FALSE)
data <- rbind(unweighted, weighted)
ggplot(data, aes(x = KCH_TKS, fill = treat)) +
geom_histogram(aes(weight = weights), bins = 10, alpha = .4, color = "black") +
facet_grid(~adj)
One way you can hack bal.plot is to provide a set of weights that are all equal to 1 as well as your desired weights and leave which at its default. If you give the weights names, those names will appear on the facet labels. So, for your example, try
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS",
type = "histogram", mirror = F,
weights = list(UNWEIGHTED = rep(1, nrow(AHEAD_nomiss),
WEIGHTED = AHEAD_nomiss$att.weights),
treat = AHEAD_nomiss$group)
You should see that "UNWEIGHTED" and "WEIGHTED" are the new facet label names. You can of course change them to be whatever you want.

Superheat Use Given Cluster Group

I generated dendrogram and have obtained k clusters. I wanted to obtain heatmap by using the existing dendrogram and clusters. I want to display my heatmap in multiple pages so I did them one by one in separate calls. To make the colour chart comparable, I also set pal.values. Here’s my code sample:
sh <- superheat(X = cbind(dt[grep("Action", colnames(dt))], profile),
heat.pal = brewer.pal(8, 'RdBu')[8:1],
heat.pal.values = c(-1.4:2.3),
bottom.label = 'variable',
grid.hline.col = '#F0EBEB',
grid.vline.col = '#F0EBEB',
smooth.heat = TRUE,
membership.rows = hc$cluster,
column.title = 'factors', row.title = 'Clusters',
bottom.label.text.size = 1.5, bottom.label.text.angle = 90)
sh$membership.rows == hc$cluster
From there, when I did checking if the memberships are matched, its displayed a mix of TRUE AND FALSE. I couldn’t find out what’s wrong with my code. Would you please enlighten me?

How to draw line around significant values in R's corrplot package

I have been asked to obtain a correlation plot for a colaborator.
My choice is to use R for the task, specifically the corrplot package.
I have been researching on the internet and I found multiple ways to obtain such graphics, but not the specific graphic I was asked for (as you can see in the picture the significant values are highlighted by drawing a square around the significant tile), which is puzzling me.
Example of the correlation plot required
The closest result I achieve is using the code under this lines, but I do not seem to be able to find the option to draw line around the significant tiles (if exists).
#Insignificant correlations are leaved blank
corrplot(res3$r, type="upper", order="hclust",
p.mat = res3$P, sig.level = 0.01, insig = "blank")
I tried adding the "addrect" parameter but it didn't work.
#Insignificant correlation are crossed
corrplot(res3$r, type="upper", order="hclust", p.mat = res3$P,
addrect=2, sig.level = 0.01, insig = "blank")
Any help will be appreciated.
corrplot allows you to add new plots to an already existing one. Therefore, once you've created the plot of the initial correlation matrix, you can simply add those cells that you want to highlight in an iterative manner using corrplot(..., add = TRUE).
The only thing required to achieve your goal is an indices vecor (which I called 'ids') to tell R which cells to highlight. Note that for reasons of simplicity, I took a random sample of the initial correlation matrix, but things like ids <- which(p.value < 0.01) (assuming that you've stored your significance levels in a separate vector) would work similarly.
library(corrplot)
## create and visualize correlation matrix
data(mtcars)
M <- cor(mtcars)
corrplot(M, cl.pos = "n", na.label = " ")
## select cells to highlight (e.g., statistically significant values)
set.seed(10)
ids <- sample(1:length(M), 15L)
## duplicate correlation matrix and reject all irrelevant values
N <- M
N[-ids] <- NA
## add significant cells to the initial corrplot iteratively
for (i in ids) {
O <- N
O[-i] <- NA
corrplot(O, cl.pos = "n", na.label = " ", addgrid.col = "black", add = TRUE,
bg = "transparent", tl.col = "transparent")
}
Note that you could also add all values to highlight in one go (i.e., without requiring a for loop) using corrplot(N, ...), but in that case, an undesirable black margin is drawn all around the plotting area.

R Circular Chord Plots

Im learning how to create circular plots in R, similiar to CIRCOS
Im using the package circlize to draw links between origin and destination pairs based on if the flight was OB, Inbound and Return. The logic fo the data doesnt really matter, its just a toy example
I have gotten the plot to work based on the code below which works based on the following logic
Take my data, combine destination column with the flight type
Convert to a matrix and feed the origin and the new column into circlize
Reference
library(dplyr)
library(circlize)
# Create Fake Flight Information in a table
orig = c("IE","GB","US","ES","FI","US","IE","IE","GB")
dest = c("FI","FI","ES","ES","US","US","FI","US","IE")
direc = c("IB","OB","RETURN","DOM","OB","DOM","IB","RETURN","IB")
mydf = data.frame(orig, dest, direc)
# Add a column that combines the dest and direction together
mydf <- mydf %>%
mutate(key = paste(dest,direc)) %>%
select (orig, key)
# Create a Binary Matrix Based on mydf
mymat <- data.matrix(as.data.frame.matrix(table(mydf)))
# create the objects you want to link from to in your diagram
from <- rownames(mymat)
to <- colnames(mymat)
# Create Diagram by suppling the matrix
par(mar = c(1, 1, 1, 1))
chordDiagram(mymat, order = sort(union(from, to)), directional = TRUE)
circos.clear()
I like the plot a lot but would like to change it a little bit. For example FI (which is Finland) has 3 measurements on the diagram FI IB, FI OB and FI. I would like to combine them all under FI if possible and distinguish between the three Types of flights using either a colour scheme, Arrows or even adding an additional track which acts as an umbrella for IB OB and RETURN flights
So for Example,
FI OB would be placed in FI but have a one way arrow to GB to signify OB
FI IB would be placed in FI but have a one way arrow into FI
FI RETURN (if it exists) would have a double headed arrow
Can anyone help, Has anyone seen anything similiar been done before?
The end result should just have the countries on the plot once so that someone can see very quickly which countries have the most amount of flights
I have tried following other posts but am afraid im getting lost when they move to the more advanced stuff
Thank you very much for your time
First, I think there is a duplicated record (IE-FI-IB) in your data.
I will first attach the code and figure and then explain a little bit.
df = data.frame(orig, dest, direc, stringsAsFactors = FALSE)
df = unique(df)
col = c("IB" = "red",
"OB" = "blue",
"RETURN" = "orange",
"DOM" = "green")
directional = c("IB" = -1,
"OB" = 1,
"RETURN" = 2,
"DOM" = 0)
diffHeight = c("IB" = -0.04,
"OB" = 0.04,
"RETURN" = 0,
"DOM" = 0)
chordDiagram(df[1:2], col = col[df[[3]]], directional = directional[df[[3]]],
direction.type = c("arrows+diffHeight"),
diffHeight = diffHeight[df[[3]]])
legend("bottomleft", pch = 15, legend = names(col), col = col)
First you need to use the development version of circlize for which
you can install it by
devtools::install_github("jokergoo/circlize")
In this new version, chordDiagram() supports input variable as a data frame and drawing two-head arrows for the links (just after reading your post :)).
In above code, col, directional, direction.type and diffHeight can all be set as a vector which corresponds to rows in df.
When directional argument in chordDiagram() is set to 2, the corresponding link will have two directions. Then if direction.type contains arrows, there will be a two-head arrow.
Since diffHeight is a vector which correspond to rows in df, if you want to visualize the direction for a single link both by arrow and offset of the roots, you need to merge these two options as a single string as shown in the example code "arrows+diffHeight".
By default direction for links are from the first column to the second column. But in your case, IB means the reversed direction, so we need to set diffHeight to a negative value to reverse the default direction.
Finally, I observe you have links which start and end in a same sector (ES-ES-DOM and US-US-DOM), you can use self.link argument to control how to represent such self-link. self.link is set to 1 in following figure.
Do you need the arrows because the color coding in the graph is telling the From / To story already (FROM -> color edge FROM COUNTRY, TO is color of the FROM COUNTRY arriving at the TO COUNTRY, IF FROM == TO Its own color returns at its own base (see US or ES for example)).
library(dplyr)
library(circlize)
# Create Fake Flight Information in a table
orig = c("IE","GB","US","ES","FI","US","IE","IE","GB")
dest = c("FI","FI","ES","ES","US","US","FI","US","IE")
mydf = data.frame(orig, dest)
# Create a Binary Matrix Based on mydf
mymat <- data.matrix(as.data.frame.matrix(table(mydf)))
# create the objects you want to link from to in your diagram
from <- rownames(mymat)
to <- colnames(mymat)
# Create Diagram by suppling the matrix
par(mar = c(1, 1, 1, 1))
chordDiagram(mymat, order = sort(union(from, to)), directional = TRUE)
circos.clear()
BY the way -> there is also a OFFSET difference on the edge that tells if it is FROM (wider edge) or TO (smaller edge)

Resources