Plotting a "sequence logo" using ggplot2? - r

Is it (reasonably) possible to plot a sequence logo plot using ggplot2?
There is a package to do it which is based on "grid" called "seqLogo", but I was wondering if there could be a ggplot2 version of it.
Thanks.

ggseqlogo should be what you're looking for. I hope this can relieve some of the frustrations I’m sure many of you have when it comes to plotting sequence logos in R

I'm submitting a ggplot2 attempt that is somewhat similar to the Leipzig/Berry solution above. This format is a little bit closer to the standard logogram.
But my solution, and I think any ggplot2 solution, still falls short because ggplot2 does not offer control over the aspect ratio of plotting symbols. This is the core capability that (I think) is required for generating sequence logos and that is missing from ggplot2.
Also note: I used the data from Jeremy Leipzig's answer, but I did not do any corrections for small sample sizes or for %GC values different than 50%.
require(ggplot2)
require(reshape2)
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
freqdf <- as.data.frame(t(freqs))
freqdf$pos = as.numeric(as.character(rownames(freqdf)))
freqdf$height <- apply(freqdf[,c('A', 'C','G','T')], MARGIN=1,
FUN=function(x){2-sum(log(x^x,base=2))})
logodf <- data.frame(A=freqdf$A*freqdf$height, C=freqdf$C*freqdf$height,
G=freqdf$G*freqdf$height, T=freqdf$T*freqdf$height,
pos=freqdf$pos)
lmf <- melt(logodf, id.var='pos')
quartz(height=3, width=8)
ggplot(data=lmf, aes(x=as.numeric(as.character(pos)), y=value)) +
geom_bar(aes(fill=variable,order=value), position='stack',
stat='identity', alpha=0.5) +
geom_text(aes(label=variable, size=value, order=value, vjust=value),
position='stack') +
theme_bw()
quartz.save('StackOverflow_5438474.png', type='png')
That produces this graph:

I have implemented an alternative designed by Charles Berry, which addresses some of the weaknesses of seqLogos discussed ad nauseam in the comment section below. It uses ggplot2:
library("devtools")
install_github("leipzig/berrylogo")
library("berrylogo")
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
p<-berrylogo(freqs,gc_content=.41)
print(p)

No direct way to do so in ggplot2, as far as I'm concerned.
However, check out RWebLogo. It's an R wrapper I have written for the WebLogo python library. You can download it from CRAN and it's hosted on github
Simple example:
# Load package
library('RWebLogo')
# Sample alignment
aln <- c('CCAACCCAA', 'CCAACCCTA', 'AAAGCCTGA', 'TGAACCGGA')
# Plot logo to file
weblogo(seqs=aln, file.out='logo.pdf')
# Plot logo to R graphics device (uses generated jpeg logo and raster package)
weblogo(seqs=aln, plot=TRUE, open=FALSE, format='jpeg', resolution=600)
For more options see ?weblogo or ?plotlogo

Here is an alternative option. motiflogo is a new representation of motif (sequence) logo implemented by ggplot2. Two aspects could be considered.
As a canonical motif logo representation
As a SNP-specific motif logo representation

There is now a gglogo package (also on CRAN, yet another amazing ggplot2 extension by Heike Hofmann).
This package that produces plots like these:
library(ggplot2)
library(gglogo)
ggplot(data = ggfortify(sequences, "peptide")) +
geom_logo(aes(x=position, y=bits, group=element,
label=element, fill=interaction(Polarity, Water)),
alpha = 0.6) +
scale_fill_brewer(palette="Paired") +
theme(legend.position = "bottom")
The example is from https://github.com/heike/gglogo/blob/master/visual_test/logos.R and there's a manuscript on the package here: https://github.com/heike/logopaper/blob/master/logos.Rmd

Related

How to plot 3D scatter diagram using ggplot?

I tried to use the plotly package, but it is not working in my case at all. The ggplot package is working for 2D plots but it is giving an error when adding one more axis. How to solve this issue?
ggplot(data,aes(x=D1,y=D2,z=D3,color=Sample)) +
geom_point()
How to add one more axis and get the 3D plot in this?
Since you tagged your question with plotly and said that you've tried to use it with plotly, I think it would be helpful to give you a working code solution in plotly:
Creating some data to plot with:
set.seed(417)
library(plotly)
temp <- rnorm(100, mean=30, sd=5)
pressure <- rnorm(100)
dtime <- 1:100
Graphing your 3d scatterplot using plotly's scatter3d type:
plot_ly(x=temp, y=pressure, z=dtime, type="scatter3d", mode="markers", color=temp)
Renders the following:
ggplot as others have note, by itself does not support 3d graphics rendering.
A possible solutions is gg3D.
gg3D is a package created to extend ggplot2 to produce 3D plots. It does exactly what you are asking for: it adds a third axis to a ggplot. I find it quite good and easy to use and that is what I use for my limited needs.
An example taken from the vignette to produce a basic plot
devtools::install_github("AckerDWM/gg3D")
library("gg3D")
## An empty plot with 3 axes
qplot(x=0, y=0, z=0, geom="blank") +
theme_void() +
axes_3D()
## Axes can be populated with points using the function stat_3D.
data(iris)
ggplot(iris, aes(x=Petal.Width, y=Sepal.Width, z=Petal.Length, color=Species)) +
theme_void() +
axes_3D() +
stat_3D()
There are other options not involving ggplot. For example the excellent plot3D package with its extension plot3Drgl to plot in openGL.
In your question, you refer to the plotly package and to the ggplot2 package. Both plotly and ggplot2 are great packages: plotly is good at creating dynamic plots that users can interact with, while ggplot2 is good at creating static plots for extreme customization and scientific publication. It is also possible to send ggplot2 output to plotly. Unfortunately, at the time of writing (April 2021), ggplot2 does not natively support 3d plots. However, there are other packages that can be used to produce 3d plots and some ways to get pretty close to ggplot2 quality. Below I review several options. These suggestions are by no means exhaustive.
plotly
See onlyphantom's answer in this thread.
gg3D
See Marco Stamazza's answer in this thread. See also my effort below.
scatterplot3d
See Seth's answer in a related thread.
lattice
See Backlin's answer in a related thread.
rgl
See this overview in the wiki guide.
rayshader
See this overview of this package's wonderful capabilities.
trans3d
See data-imaginist use trans3d to get a cube into ggplot2.
ggrgl
See this cool and useful coolbutuseless introduction.
Now let me review some of my efforts with the Lorenz attractor trajectories. While customization remains limited, I've had best results for PDF output with gg3D. I also include a ggrgl example.
gg3D
# Packages
library(deSolve)
library(ggplot2)
library(gg3D) # remotes::install_github("AckerDWM/gg3D")
# Directory
setwd("~/R/workspace/")
# Parameters
parms <- c(a=10, b=8/3, c=28)
# Initial state
state <- c(x=0.01, y=0.0, z=0.0)
# Time span
times <- seq(0, 50, by=1/200)
# Lorenz system
lorenz <- function(times, state, parms) {
with(as.list(c(state, parms)), {
dxdt <- a*(y-x)
dydt <- x*(c-z)-y
dzdt <- x*y-b*z
return(list(c(dxdt, dydt, dzdt)))
})
}
# Make dataframe
df <- as.data.frame(ode(func=lorenz, y=state, parms=parms, times=times))
# Make plot
make_plot <- function(theta=0, phi=0){
ggplot(df, aes(x=x, y=y, z=z, colour=time)) +
axes_3D(theta=theta, phi=phi) +
stat_3D(theta=theta, phi=phi, geom="path") +
labs_3D(theta=theta, phi=phi,
labs=c("x", "y", "z"),
angle=c(0,0,0),
hjust=c(0,2,2),
vjust=c(2,2,-2)) +
ggtitle("Lorenz butterfly") +
theme_void() +
theme(legend.position = "none")
}
make_plot()
make_plot(theta=180,phi=0)
# Save plot as PDF
ggsave(last_plot(), filename="lorenz-gg3d.pdf")
Pros: Outputs high-quality PDF:
Cons: Still limited customization. But for my specific needs, currently the best option.
ggrgl
# Packages
library(deSolve)
library(ggplot2)
library(rgl)
#remotes::install_github("dmurdoch/rgl")
library(ggrgl)
# remotes::install_github('coolbutuseless/ggrgl', ref='main')
library(devout)
library(devoutrgl)
# remotes::install_github('coolbutuseless/devoutrgl', ref='main')
library(webshot2)
# remotes::install_github("rstudio/webshot2")
library(ggthemes)
# Directory
setwd("~/R/workspace/")
# Parameters
parms <- c(a=10, b=8/3, c=26.48)
# Initial state
state <- c(x=0.01, y=0.0, z=0.0)
# Time span
times <- seq(0, 100, by=1/500)
# Lorenz system
lorenz <- function(times, state, parms) {
with(as.list(c(state, parms)), {
dxdt <- a*(y-x)
dydt <- x*(c-z)-y
dzdt <- x*y-b*z
return(list(c(dxdt, dydt, dzdt)))
})
}
# Make dataframe
df <- as.data.frame(ode(func=lorenz, y=state, parms=parms, times=times))
# Make plot
ggplot(df, aes(x=x, y=y, z=z)) +
geom_path_3d() +
ggtitle("Lorenz butterfly") -> p
# Render Plot in window
rgldev(fov=30, view_angle=-10, zoom=0.7)
p + theme_ggrgl(16)
# Save plot as PNG
rgldev(fov=30, view_angle=-10, zoom=0.7,
file = "~/R/Work/plots/lorenz-attractor/ggrgl/lorenz-ggrgl.png",
close_window = TRUE, dpi = 300)
p + theme_ggrgl(16)
dev.off()
Pros: The plot can be rotated in a way similar to plotly. It is possible to 'theme' a basic plot:
Cons: The figure is missing a third axis with labels. Cannot output high-quality plots. While I've been able to view and save a low-quality black trajectory in PNG, I could view a colored trajectory like the above, but could not save it, except with a low-quality screenshot:
Related threads: plot-3d-data-in-r, ploting-3d-graphics-with-r.

Plot histograms or pie charts in a scatter plot

I need to repeat the thing done in:
tiny pie charts to represent each point in an scatterplot using ggplot2 but I stumbled into the problem that the package ggsubplot is not available for 3.3.1 R version.
Essentially I need a histogram or a pie chart in predefined points on the scatterplot. Here is the same code that is used in the cited post:
foo <- data.frame(X=runif(30),Y=runif(30),A=runif(30),B=runif(30),C=runif(30))
foo.m <- melt(foo, id.vars=c("X","Y"))
ggplot(foo.m, aes(X,Y))+geom_point()
ggplot(foo.m) +
geom_subplot2d(aes(x = X, y = Y, subplot = geom_bar(aes(variable,
value, fill = variable), stat = "identity")), width = rel(.5), ref = NULL)
The code used libraries reshape2, ggplot2 and ggsubplot.
The image that I want to see is in the post cited above
UPD: I downloaded the older versions of R (3.0.2 and 3.0.3) and checkpoint package, and used:
checkpoint("2014-09-18")
as was described in the comment bellow. But I get an error:
Using binwidth 0.0946
Using binwidth 0.0554
Error in layout_base(data, vars, drop = drop) :
At least one layer must contain all variables used for facetting
Which I can't get around, because when I try to include facet, the following error comes up:
Error: ggsubplots do not support facetting
It doesn't look like ggsubplot is going to fix itself any time soon. One option would be to use the checkpoint package, and essentially "reset" your copy of R to a time when the package was compatible. This post suggests using a time point of 2014-09-18.

SeqLogo of AA in R [duplicate]

Is it (reasonably) possible to plot a sequence logo plot using ggplot2?
There is a package to do it which is based on "grid" called "seqLogo", but I was wondering if there could be a ggplot2 version of it.
Thanks.
ggseqlogo should be what you're looking for. I hope this can relieve some of the frustrations I’m sure many of you have when it comes to plotting sequence logos in R
I'm submitting a ggplot2 attempt that is somewhat similar to the Leipzig/Berry solution above. This format is a little bit closer to the standard logogram.
But my solution, and I think any ggplot2 solution, still falls short because ggplot2 does not offer control over the aspect ratio of plotting symbols. This is the core capability that (I think) is required for generating sequence logos and that is missing from ggplot2.
Also note: I used the data from Jeremy Leipzig's answer, but I did not do any corrections for small sample sizes or for %GC values different than 50%.
require(ggplot2)
require(reshape2)
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
freqdf <- as.data.frame(t(freqs))
freqdf$pos = as.numeric(as.character(rownames(freqdf)))
freqdf$height <- apply(freqdf[,c('A', 'C','G','T')], MARGIN=1,
FUN=function(x){2-sum(log(x^x,base=2))})
logodf <- data.frame(A=freqdf$A*freqdf$height, C=freqdf$C*freqdf$height,
G=freqdf$G*freqdf$height, T=freqdf$T*freqdf$height,
pos=freqdf$pos)
lmf <- melt(logodf, id.var='pos')
quartz(height=3, width=8)
ggplot(data=lmf, aes(x=as.numeric(as.character(pos)), y=value)) +
geom_bar(aes(fill=variable,order=value), position='stack',
stat='identity', alpha=0.5) +
geom_text(aes(label=variable, size=value, order=value, vjust=value),
position='stack') +
theme_bw()
quartz.save('StackOverflow_5438474.png', type='png')
That produces this graph:
I have implemented an alternative designed by Charles Berry, which addresses some of the weaknesses of seqLogos discussed ad nauseam in the comment section below. It uses ggplot2:
library("devtools")
install_github("leipzig/berrylogo")
library("berrylogo")
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
p<-berrylogo(freqs,gc_content=.41)
print(p)
No direct way to do so in ggplot2, as far as I'm concerned.
However, check out RWebLogo. It's an R wrapper I have written for the WebLogo python library. You can download it from CRAN and it's hosted on github
Simple example:
# Load package
library('RWebLogo')
# Sample alignment
aln <- c('CCAACCCAA', 'CCAACCCTA', 'AAAGCCTGA', 'TGAACCGGA')
# Plot logo to file
weblogo(seqs=aln, file.out='logo.pdf')
# Plot logo to R graphics device (uses generated jpeg logo and raster package)
weblogo(seqs=aln, plot=TRUE, open=FALSE, format='jpeg', resolution=600)
For more options see ?weblogo or ?plotlogo
Here is an alternative option. motiflogo is a new representation of motif (sequence) logo implemented by ggplot2. Two aspects could be considered.
As a canonical motif logo representation
As a SNP-specific motif logo representation
There is now a gglogo package (also on CRAN, yet another amazing ggplot2 extension by Heike Hofmann).
This package that produces plots like these:
library(ggplot2)
library(gglogo)
ggplot(data = ggfortify(sequences, "peptide")) +
geom_logo(aes(x=position, y=bits, group=element,
label=element, fill=interaction(Polarity, Water)),
alpha = 0.6) +
scale_fill_brewer(palette="Paired") +
theme(legend.position = "bottom")
The example is from https://github.com/heike/gglogo/blob/master/visual_test/logos.R and there's a manuscript on the package here: https://github.com/heike/logopaper/blob/master/logos.Rmd

How to furnish a ggplot2 figure with a hyperlink?

I am trying to furnish a ggplot2 plot with a hyperlink:
This works:
library(gridSVG)
library(lattice)
xyplot(mpg~wt, data=mtcars, main = "Link to R-project home")
mainGrobName <- grep("main", grid.ls()[[1]], value=TRUE)
grid.hyperlink(mainGrobName, "http://www.r-project.org")
gridToSVG("HyperlinkExample.svg")
This not:
p = ggplot(mtcars, aes(wt, mpg)) + geom_point()+ labs(title="link")
print(p)
mainGrobName <- grep("title", grid.ls()[[1]], value=TRUE)
grid.hyperlink(mainGrobName, "http://www.r-project.org")
gridToSVG("HyperlinkExample.svg")
Any hints on this?
I have asked Simon Potter, one of the authors of the gridSVG package:
Here is his (working) answer:
I suggest you try the development version here:
http://r-forge.r-project.org/R/?group_id=1025
It contains a workaround specifically to deal with gTables (and therefore ggplot2 graphics).
So to try and get your example to work, start up a new R session and run the following code:
install.packages("gridSVG", repos="http://R-Forge.R-project.org")
library(gridSVG)
library(ggplot2)
(p <- ggplot(mtcars, aes(wt, mpg)) + geom_point() + labs(title="link"))
titleGrobName <- grep("title", grid.ls(print=FALSE)$name, value=TRUE)
grid.hyperlink(titleGrobName, "http://www.r-project.org/")
gridToSVG("HyperlinkExample.svg", "none", "none")
The only real difference here are the additional parameters given to gridToSVG(). This is mainly to reduce the output to just the SVG file and an HTML wrapper (otherwise you also get some JSON data, which is not useful for your example).
As far as getting the correct mainGrobName this code would succeed (and not create the distracting side-effects by setting print=FALSE):
grep("title", grid.ls(print=FALSE)$name, value=TRUE)
#[1] "title.2-4-2-4"
The structure of the plot object is clearly more complex than in the lattice situation and the gridToSVG does not capture it by default:
grep("title", grid.ls()$name, value=TRUE)
#--------------------
GRID.gTableParent.125
background.1-5-6-1
spacer.4-3-4-3
panel.3-4-3-4
grill.gTree.103
panel.background.rect.94
panel.grid.minor.y.polyline.96
panel.grid.minor.x.polyline.98
panel.grid.major.y.polyline.100
panel.grid.major.x.polyline.102
geom_point.points.90
panel.border.zeroGrob.91
axis-l.3-3-3-3
axis.line.y.zeroGrob.113
axis
axis-b.4-4-4-4
axis.line.x.zeroGrob.107
axis
xlab.5-4-5-4
ylab.3-2-3-2
title.2-4-2-4
This is also interesting output but I fail to see how I can get gridToSVG to convert it into a useful HTML object:
grid.ls( print=pathListing )$name

Extract color information from ggplot2?

Using this dummy code saved in a file named foo.txt...
COG,station1,station2,station3,station4,station5
COG000Z,0.019393497,0.183122497,0.089911227,0.283250444,0.074110521
COG0002,0.044632051,0.019118032,0.034625785,0.069892277,0.034073709
COG0001,0.033066112,0,0,0,0
COG0004,0.115086472,0.098805295,0.148167492,0.040019101,0.043982814
COG0005,0.064613057,0.03924007,0.105262559,0.076839235,0.031070155
COG0006,0.079920475,0.188586049,0.123607421,0.27101229,0.274806929
COG0007,0.051727492,0.066311584,0.080655401,0.027024185,0.059156417
COG0008,0.126254841,0.108478559,0.139106704,0.056430812,0.099823028
I made a heatmap in ggplot2 with the accompanying code from following this answer on stackexchange.
> library(ggplot2)
> foo = read.table('foo.txt', header=T, sep=',')
> foomelt = melt(foo)
Using COG as id variables
> ggplot(foomelt, aes(x=COG, y=variable, fill=value)) + geom_tile() + scale_fill_gradient(low='white', high='steelblue')
It produces a really nice heatmap, but I'm really just after the color codes of each tile (basically the original foo but with color codes in place of each variable). Any idea how to go about this?
I'm working on pulling all scale related code from ggplot2 into a separate package - this will make it much easier to use the same scales in different ways. See https://github.com/hadley/scales for the in progress code.
Rather than extracting the colours from the plot, use colorRampPalette:
a<-colorRampPalette(c("white","steelblue"))
plot_colours<-a(n)
where n is the number of colours in your heat map. In your example, I get n=6 so:
n<-6
a(n)
returns
[1] "#FFFFFF" "#DAE6F0" "#B4CDE1" "#90B3D2" "#6A9BC3" "#4682B4"
and
image(1:n,1,as.matrix(1:n),col=a(n))
returns

Resources