Related
I am producing a stratigraphy plot which should look something like the following
I've got to the point where I can plot the layout of the plot using some dummy data and the following code
Strat <- c(657,657,657,657,657,657,657,657,657,657,601,601,601,601,601,601,601,601,601,601,610,610,610,610,610,610,610,610,610,610)
Distance <- c(7.87,17.89,22.09,42.84,50.65,55.00,65.74,69.38,72.36,75.31,7.87,17.89,22.09,42.84,50.65,55.00,65.74,69.38,72.36,75.31,7.87,17.89,22.09,42.84,50.65,55.00,65.74,69.38,72.36,75.31)
Altitude <- c(565.05,191.98,808.12,609.19,579.10,657.08,708.00,671.44,312.10,356.14,565.05,191.98,808.12,609.19,579.10,657.08,708.00,671.44,312.10,356.14,565.05,191.98,808.12,609.19,579.10,657.08,708.00,671.44,312.10,356.14)
strat_max <- c(565.05,191.98,808.12,609.19,579.10,657.08,708.00,671.44,312.10,356.14,565.04,176.23,795.52,608.06,567.89,641.83,698.69,664.50,310.21,350.11,526.47,147.30,762.49,601.99,544.22,632.54,689.33,636.40,282.71,313.56)
strat_min <- c(565.04,176.23,795.52,608.06,567.89,641.83,698.69,664.50,310.21,350.11,526.47,147.30,762.49,601.99,544.22,632.54,689.33,636.40,282.71,313.56,463.31,81.01,718.11,594.38,539.53,616.18,670.79,602.96,249.59,289.63)
strat <- cbind(Strat, Distance, Altitude, strat_max, strat_min)
strat <- as.data.frame(strat)
attach(strat)
ggplot(strat, aes(x=Distance, y=Altitude, colour=factor(Strat))) +
geom_linerange(aes(x=Distance, ymax=strat_max, ymin=strat_min, colour=factor(Strat)), lwd=10) +
geom_line(lty=1, lwd=1.5, colour="black") +
xlab("Distance") + ylab("Altitude") +
theme_bw() + scale_colour_discrete(name="Stratigraphy Type")
However, I have been unable to add the relevant patterns. Each rock/ sediment type has a standard plotting pattern for use in stratigraphic plots from the USGS and I would like to relate the code in strat$Strat to the relevant pattern and use that as the pattern
Does anyone know how to import these (e.g. as PNG files) and then use them as patterns? I had thought to try and call them as colours but I don't know if that would work and there's probably no framework for telling R how to repeat them. Currently I am writing the plot as shown and then adding in the patterns in adobe illustrator
Any insight appreciated!
I have a plot of categorical variables as below:
http://i.imgur.com/d1hJP21.png
This is a very small subset of the actual data (n > 10000)
While jittering handles the overplotting, it is ugly and can lead to ambiguity. I was keen to instead place bubbles to show the number of points that are co-incident.
I can't seem to find a simple and repeatable way to do this.
Thank you in advance!
Edit:
Thanks for the feedback. Here is what I hope is a reproducible example:
First, a CSV of the data (long, but relevant in this example):
ID,g,wf,fi
1824848,14,2,4
1314001,14,2,3
670960,14,1,3
1313235,15,3,4
1172304,3,5,4
1859973,15,1,3
1826951,14,1,4
1868238,15,1,2
1911869,15,1,4
1911861,15,1,2
926829,14,1,3
1609578,3,4,4
1306895,3,5,4
1199557,15,1,4
692849,10,3,4
1923352,3,5,4
1881724,4,4,4
1384603,3,5,4
1928829,15,1,4
493503,3,5,4
902650,15,1,3
1887582,6,4,4
1887584,3,5,4
1933992,13,1,4
635372,3,3,4
1892765,15,1,2
1934773,13,2,4
1892530,14,2,4
936786,3,5,4
1897585,13,3,4
1895932,15,1,3
422785,15,1,3
1219573,8,1,4
1897817,3,2,4
1899612,14,3,4
1939157,15,1,4
1952043,14,1,3
1938048,14,1,3
1896607,15,1,2
1941385,15,1,3
1959437,3,5,4
1064010,15,1,3
1951600,13,3,4
541439,15,1,4
1938609,3,5,4
1958667,15,1,2
1943792,10,1,4
1943782,14,1,4
1893714,14,1,4
1335502,15,1,1
1950179,3,2,4
1959069,15,1,2
1958811,15,1,2
1958808,15,3,4
1959878,15,1,1
1949904,15,1,3
1961475,15,1,4
1876863,15,1,4
384705,15,1,3
1966338,15,1,4
1980290,3,4,4
1966997,15,2,4
1967107,15,1,1
1976077,15,1,2
1967579,11,1,4
1967387,4,2,4
1973408,3,3,4
1684881,3,3,3
...and the plot code:
sx <- ggplot(dx, aes(x=fi, y=wf)) +
geom_point(shape=19, alpha=1, size=1, position=position_jitter(width=0.1,height=.1))
print(sx)
I really don't know where to go from here, other than manually making a count matrix...
Thanks again (sorry, new to stackoverflow).
Is it (reasonably) possible to plot a sequence logo plot using ggplot2?
There is a package to do it which is based on "grid" called "seqLogo", but I was wondering if there could be a ggplot2 version of it.
Thanks.
ggseqlogo should be what you're looking for. I hope this can relieve some of the frustrations I’m sure many of you have when it comes to plotting sequence logos in R
I'm submitting a ggplot2 attempt that is somewhat similar to the Leipzig/Berry solution above. This format is a little bit closer to the standard logogram.
But my solution, and I think any ggplot2 solution, still falls short because ggplot2 does not offer control over the aspect ratio of plotting symbols. This is the core capability that (I think) is required for generating sequence logos and that is missing from ggplot2.
Also note: I used the data from Jeremy Leipzig's answer, but I did not do any corrections for small sample sizes or for %GC values different than 50%.
require(ggplot2)
require(reshape2)
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
freqdf <- as.data.frame(t(freqs))
freqdf$pos = as.numeric(as.character(rownames(freqdf)))
freqdf$height <- apply(freqdf[,c('A', 'C','G','T')], MARGIN=1,
FUN=function(x){2-sum(log(x^x,base=2))})
logodf <- data.frame(A=freqdf$A*freqdf$height, C=freqdf$C*freqdf$height,
G=freqdf$G*freqdf$height, T=freqdf$T*freqdf$height,
pos=freqdf$pos)
lmf <- melt(logodf, id.var='pos')
quartz(height=3, width=8)
ggplot(data=lmf, aes(x=as.numeric(as.character(pos)), y=value)) +
geom_bar(aes(fill=variable,order=value), position='stack',
stat='identity', alpha=0.5) +
geom_text(aes(label=variable, size=value, order=value, vjust=value),
position='stack') +
theme_bw()
quartz.save('StackOverflow_5438474.png', type='png')
That produces this graph:
I have implemented an alternative designed by Charles Berry, which addresses some of the weaknesses of seqLogos discussed ad nauseam in the comment section below. It uses ggplot2:
library("devtools")
install_github("leipzig/berrylogo")
library("berrylogo")
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
p<-berrylogo(freqs,gc_content=.41)
print(p)
No direct way to do so in ggplot2, as far as I'm concerned.
However, check out RWebLogo. It's an R wrapper I have written for the WebLogo python library. You can download it from CRAN and it's hosted on github
Simple example:
# Load package
library('RWebLogo')
# Sample alignment
aln <- c('CCAACCCAA', 'CCAACCCTA', 'AAAGCCTGA', 'TGAACCGGA')
# Plot logo to file
weblogo(seqs=aln, file.out='logo.pdf')
# Plot logo to R graphics device (uses generated jpeg logo and raster package)
weblogo(seqs=aln, plot=TRUE, open=FALSE, format='jpeg', resolution=600)
For more options see ?weblogo or ?plotlogo
Here is an alternative option. motiflogo is a new representation of motif (sequence) logo implemented by ggplot2. Two aspects could be considered.
As a canonical motif logo representation
As a SNP-specific motif logo representation
There is now a gglogo package (also on CRAN, yet another amazing ggplot2 extension by Heike Hofmann).
This package that produces plots like these:
library(ggplot2)
library(gglogo)
ggplot(data = ggfortify(sequences, "peptide")) +
geom_logo(aes(x=position, y=bits, group=element,
label=element, fill=interaction(Polarity, Water)),
alpha = 0.6) +
scale_fill_brewer(palette="Paired") +
theme(legend.position = "bottom")
The example is from https://github.com/heike/gglogo/blob/master/visual_test/logos.R and there's a manuscript on the package here: https://github.com/heike/logopaper/blob/master/logos.Rmd
Is it (reasonably) possible to plot a sequence logo plot using ggplot2?
There is a package to do it which is based on "grid" called "seqLogo", but I was wondering if there could be a ggplot2 version of it.
Thanks.
ggseqlogo should be what you're looking for. I hope this can relieve some of the frustrations I’m sure many of you have when it comes to plotting sequence logos in R
I'm submitting a ggplot2 attempt that is somewhat similar to the Leipzig/Berry solution above. This format is a little bit closer to the standard logogram.
But my solution, and I think any ggplot2 solution, still falls short because ggplot2 does not offer control over the aspect ratio of plotting symbols. This is the core capability that (I think) is required for generating sequence logos and that is missing from ggplot2.
Also note: I used the data from Jeremy Leipzig's answer, but I did not do any corrections for small sample sizes or for %GC values different than 50%.
require(ggplot2)
require(reshape2)
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
freqdf <- as.data.frame(t(freqs))
freqdf$pos = as.numeric(as.character(rownames(freqdf)))
freqdf$height <- apply(freqdf[,c('A', 'C','G','T')], MARGIN=1,
FUN=function(x){2-sum(log(x^x,base=2))})
logodf <- data.frame(A=freqdf$A*freqdf$height, C=freqdf$C*freqdf$height,
G=freqdf$G*freqdf$height, T=freqdf$T*freqdf$height,
pos=freqdf$pos)
lmf <- melt(logodf, id.var='pos')
quartz(height=3, width=8)
ggplot(data=lmf, aes(x=as.numeric(as.character(pos)), y=value)) +
geom_bar(aes(fill=variable,order=value), position='stack',
stat='identity', alpha=0.5) +
geom_text(aes(label=variable, size=value, order=value, vjust=value),
position='stack') +
theme_bw()
quartz.save('StackOverflow_5438474.png', type='png')
That produces this graph:
I have implemented an alternative designed by Charles Berry, which addresses some of the weaknesses of seqLogos discussed ad nauseam in the comment section below. It uses ggplot2:
library("devtools")
install_github("leipzig/berrylogo")
library("berrylogo")
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
p<-berrylogo(freqs,gc_content=.41)
print(p)
No direct way to do so in ggplot2, as far as I'm concerned.
However, check out RWebLogo. It's an R wrapper I have written for the WebLogo python library. You can download it from CRAN and it's hosted on github
Simple example:
# Load package
library('RWebLogo')
# Sample alignment
aln <- c('CCAACCCAA', 'CCAACCCTA', 'AAAGCCTGA', 'TGAACCGGA')
# Plot logo to file
weblogo(seqs=aln, file.out='logo.pdf')
# Plot logo to R graphics device (uses generated jpeg logo and raster package)
weblogo(seqs=aln, plot=TRUE, open=FALSE, format='jpeg', resolution=600)
For more options see ?weblogo or ?plotlogo
Here is an alternative option. motiflogo is a new representation of motif (sequence) logo implemented by ggplot2. Two aspects could be considered.
As a canonical motif logo representation
As a SNP-specific motif logo representation
There is now a gglogo package (also on CRAN, yet another amazing ggplot2 extension by Heike Hofmann).
This package that produces plots like these:
library(ggplot2)
library(gglogo)
ggplot(data = ggfortify(sequences, "peptide")) +
geom_logo(aes(x=position, y=bits, group=element,
label=element, fill=interaction(Polarity, Water)),
alpha = 0.6) +
scale_fill_brewer(palette="Paired") +
theme(legend.position = "bottom")
The example is from https://github.com/heike/gglogo/blob/master/visual_test/logos.R and there's a manuscript on the package here: https://github.com/heike/logopaper/blob/master/logos.Rmd
Using this dummy code saved in a file named foo.txt...
COG,station1,station2,station3,station4,station5
COG000Z,0.019393497,0.183122497,0.089911227,0.283250444,0.074110521
COG0002,0.044632051,0.019118032,0.034625785,0.069892277,0.034073709
COG0001,0.033066112,0,0,0,0
COG0004,0.115086472,0.098805295,0.148167492,0.040019101,0.043982814
COG0005,0.064613057,0.03924007,0.105262559,0.076839235,0.031070155
COG0006,0.079920475,0.188586049,0.123607421,0.27101229,0.274806929
COG0007,0.051727492,0.066311584,0.080655401,0.027024185,0.059156417
COG0008,0.126254841,0.108478559,0.139106704,0.056430812,0.099823028
I made a heatmap in ggplot2 with the accompanying code from following this answer on stackexchange.
> library(ggplot2)
> foo = read.table('foo.txt', header=T, sep=',')
> foomelt = melt(foo)
Using COG as id variables
> ggplot(foomelt, aes(x=COG, y=variable, fill=value)) + geom_tile() + scale_fill_gradient(low='white', high='steelblue')
It produces a really nice heatmap, but I'm really just after the color codes of each tile (basically the original foo but with color codes in place of each variable). Any idea how to go about this?
I'm working on pulling all scale related code from ggplot2 into a separate package - this will make it much easier to use the same scales in different ways. See https://github.com/hadley/scales for the in progress code.
Rather than extracting the colours from the plot, use colorRampPalette:
a<-colorRampPalette(c("white","steelblue"))
plot_colours<-a(n)
where n is the number of colours in your heat map. In your example, I get n=6 so:
n<-6
a(n)
returns
[1] "#FFFFFF" "#DAE6F0" "#B4CDE1" "#90B3D2" "#6A9BC3" "#4682B4"
and
image(1:n,1,as.matrix(1:n),col=a(n))
returns