Set common y axis limits from a list of ggplots - r

I am running a function that returns a custom ggplot from an input data (it is in fact a plot with several layers on it). I run the function over several different input data and obtain a list of ggplots.
I want to create a grid with these plots to compare them but they all have different y axes.
I guess what I have to do is extract the maximum and minimum y axes limits from the ggplot list and apply those to each plot in the list.
How can I do that? I guess its through the use of ggbuild. Something like this:
test = ggplot_build(plot_list[[1]])
> test$layout$panel_scales_x
[[1]]
<ScaleContinuousPosition>
Range:
Limits: 0 -- 1
I am not familiar with the structure of a ggplot_build and maybe this one in particular is not a standard one as it comes from a "custom" ggplot.
For reference, these plots are created whit the gseaplot2 function from the enrichplot package.
I dont know how to "upload" an R object but if that would help, let me know how to do it.
Thanks!
edit after comments (thanks for your suggestions!)
Here is an example of the a gseaplot2 plot. GSEA stands for Gene Set Enrichment Analysis, it is a technique used in genomic studies. The gseaplot2 function calculates a running average and then plots it and another bar plot on the bottom.
and here is the grid I create to compare the plots generated from different data:
I would like to have a common scale for the "Running Enrichment Score" part.
I guess I could try to recreate the gseaplot2 function and input all of the datasets and then create the grid by facet_wrap, but I was wondering if there was an easy way of extracting parameters from a plot list.
As a reproducible example (from the enrichplot package):
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
wpgmtfile <- system.file("extdata/wikipathways-20180810-gmt-Homo_sapiens.gmt", package="clusterProfiler")
wp2gene <- read.gmt(wpgmtfile)
wp2gene <- wp2gene %>% tidyr::separate(term, c("name","version","wpid","org"), "%")
wpid2gene <- wp2gene %>% dplyr::select(wpid, gene) #TERM2GENE
wpid2name <- wp2gene %>% dplyr::select(wpid, name) #TERM2NAME
ewp2 <- GSEA(geneList, TERM2GENE = wpid2gene, TERM2NAME = wpid2name, verbose=FALSE)
gseaplot2(ewp2, geneSetID=1, subplots=1:2)
And this is how I generate the plot list (probably there is a much more elegant way):
plot_list = list()
for(i in 1:3) {
fig_i = gseaplot2(ewp2,
geneSetID=i,
subplots=1:2)
plot_list[[i]] = fig_i
}
ggarrange(plotlist=plot_list)

Related

Represent a colored polygon in ggplot2

I am using the statspat package because I am working on spatial patterns.
I would like to do in ggplot and with colors instead of numbers (because it is not too readable),
the following graph, produced with the plot.quadratest function: Polygone
The numbers that interest me for the intensity of the colors are those at the bottom of each box.
The test object contains the following data:
Test object
I have looked at the help of the function, as well as the code of the function but I still cannot manage it.
Ideally I would like my final figure to look like this (maybe not with the same colors haha):
Final object
Thanks in advance for your help.
Please provide a reproducible example in the future.
The package reprex may be very helpful.
To use ggplot2 for this my best bet would be to convert
spatstat objects to sf and do the plotting that way,
but it may take some time. If you are willing to use base
graphics and spatstat you could do something like:
library(spatstat)
# Data (using a built-in dataset):
X <- unmark(chorley)
plot(X, main = "")
# Test:
test <- quadrat.test(X, nx = 4)
# Default plot:
plot(test, main = "")
# Extract the the `quadratcount` object (regions with observed counts):
counts <- attr(test, "quadratcount")
# Convert to `tess` (raw regions with no numbers)
regions <- as.tess(counts)
# Add residuals as marks to the tessellation:
marks(regions) <- test$residuals
# Plot regions with marks as colors:
plot(regions, do.col = TRUE, main = "")

r coding for customising vegan plot

I am attempting to produce an NMDS plot in vegan, but really struggling with the code. I am trying to display the site points and species points differently, with the site points coloured according to treatment. Both lines work individually, but I cannot work out how to combine these two lines of code into one line to form one graph. I am using ordipointlabel to prevent overlap. These are the two lines of code I want to combine into one.
ordipointlabel(NMDS10, scaling=2, display="species", select=sel)
ordipointlabel(NMDS10,display="sites", col=c(rep("darkgreen",4),rep("blue4",4)),cex=0.75)
You can access directly to ordinpointlabel object and make it look like you wish. Please see the sample:
library(vegan)
data(dune)
NMDS10 <- metaMDS(dune[1:8, ])
pdf(file = NULL)
y <- ordipointlabel(NMDS10, display=c("sites", "species"))
dev.off()
# select sites & species
sel <- unlist(dimnames(dune[1:8, ]))[-(20:ncol(dune))]
# messing with ordipointlabel object
y$points <- y$points[rownames(y$points) %in% sel, ]
y$args$pcol[] = rep("red", length(y$args$pcol))
y$args$pcol[1:8] <- c(rep("darkgreen", 4), rep("blue4", 4))
y$par$cex <- 0.75
plot(y)

Understanding the duplication of plots in cowplot plot_grid

In desperate need of a sanity check. I am struggling to see why the result of plot_grid (cowplot) of N plots in my code is producing N identical plots. From the list I provide, I've taken out each data frame to verify that each plot should be different, however, when I pass in the complete list to plot_grid they all look identical.
p <- vector("list",length(dataList))
for(i in 1:length(dataList)) {
df <- dataList[[i]]
p[[i]] <- ggplot(df, aes(df$base)) + geom_bar()
}
multi <- plot_grid(plotlist=p, align="hv")
save_plot(paste("data_freqs.tiff",sep=""), multi, dpi=300, base_aspect_ratio=1.5)
For example, when type the following I can see the data is different:
a<-dataList[[1]]
b<-dataList[[2]]
sum(a$base=="T")
>1245
sum(b$base=="T")
>1034
However, I end up with multiple plots of identical T values (all fixed to 1245).
Any help is much appreciated.
Thanks

How can I extract the matrix derived from a heatmap created with gplots after hierarchical clustering?

I am making a heatmap, but I can't assign the result in a variable to check the result before plotting. Rstudio plot it automatically. I would like to get the list of rownames in the order of the heatmap. I'am not sure if this is possible. I'am using this code:
hm <- heatmap.2( assay(vsd)[ topVarGenes, ], scale="row",
trace="none", dendrogram="both",
col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255),
ColSideColors = c(Controle="gray", Col1.7G2="darkgreen", JG="blue", Mix="orange")[
colData(vsd)$condition ] )
You can assign the plot to an object. The plot will still be drawn in the plot window, however, you'll also get a list with all the data for each plot element. Then you just need to extract the desired plot elements from the list. For example:
library(gplots)
p = heatmap.2(as.matrix(mtcars), dendrogram="both", scale="row")
p is a list with all the elements of the plot.
p # Outputs all the data in the list; lots of output to the console
str(p) # Struture of p; also lots of output to the console
names(p) # Names of all the list elements
p$rowInd # Ordering of the data rows
p$carpet # The heatmap values
You'll see all the other values associated with the dendrogram and the heatmap if you explore the list elements.
To others out there, a more complete description way to capture a matrix representation of the heatmap created by gplots:
matrix_map <- p$carpet
matrix_map <- t(matrix_map)

Add to ggplot with element of different length

I'm new to ggplot2 and I'm trying to figure out how I can add a line to an already existing plot I created. The original plot, which is the cumulative distribution of a column of data T1 from a data frame x, has about 100,000 elements in it. I have successfully plotted this using ggplot2 and stat_ecdf() with the code I posted below. Now I want to add another line using a set of (x,y) coordinates, but when I try this using geom_line() I get the error message:
Error in data.frame(x = c(0, 7.85398574631245e-07, 3.14159923334398e-06, :
arguments imply differing number of rows: 1001, 100000
Here's the code I'm trying to use:
> set.seed(42)
> x <- data.frame(T1=rchisq(100000,1))
> ps <- seq(0,1,.001)
> ts <- .5*qchisq(ps,1) #50:50 mixture of chi-square (df=1) and 0
> p <- ggplot(x,aes(T1)) + stat_ecdf() + geom_line(aes(ts,ps))
That's what produces the error from above. Now here's the code using base graphics that I used to use but that I am now trying to move away from:
plot(ecdf(x$T1),xlab="T1",ylab="Cum. Prob.",xlim=c(0,4),ylim=c(0,1),main="Empirical vs. Theoretical Distribution of T1")
lines(ts,ps)
I've seen some other posts about adding lines in general, but what I haven't seen is how to add a line when the two originating vectors are not of the same length. (Note: I don't want to just use 100,000 (x,y) coordinates.)
As a bonus, is there an easy way, similar to using abline, to add a drop line on a ggplot2 graph?
Any advice would be much appreciated.
ggplot deals with data.frames, you need to make ts and ps a data.frame then specify this extra data.frame in your call to geom_line:
set.seed(42)
x <- data.frame(T1=rchisq(100000,1))
ps <- seq(0,1,.001)
ts <- .5*qchisq(ps,1) #50:50 mixture of chi-square (df=1) and 0
tpdf <- data.frame(ts=ts,ps=ps)
p <- ggplot(x,aes(T1)) + stat_ecdf() + geom_line(data=tpdf, aes(ts,ps))

Resources