ggtree issue with coloring both tips and branches - r

I'm trying to make a plot using ggtree - but I'm having some issues when I try to have both the tip points and branches colored. The tree works with both of these independently, but when I try them together the fill for the nodes is overrided by the color argument from the branch and they come out grey (or it's ignoring it all together and defaulting to the same NA color?).
Here's the minimum code needed to produce the issue:
p <- ggtree(rerooted_tree, aes(color = support))
p <- p %<+% my_DF +
geom_tippoint(aes(fill = as.factor(domains.present)))
p
The variable domains.present is a character column in the dataframe, and works perfectly if it's color instead of fill like in the code below. In the above code, though, if domains.present isn't written as.factor within aes I get an error message saying Continuous value supplied to discrete scale.
q <- ggtree(rerooted_tree)
q <- q %<+% All.my_DF +
geom_tippoint(aes(color = domains.present), size = 1)
q
I'm hoping this is just a syntax issue, but I'm working on getting a reprex together to add if needed. This is a very similar issue to this post, but the OP there solved it without ggtree (I'd rather keep it simple if possible). Thank you in advance!

I ran into the same issue recently and having the branch colour defined outside of aes() worked for me:
p <- ggtree(rerooted_tree, color = support)
p <- p %<+% my_DF +
geom_tippoint(aes(fill = as.factor(domains.present)))
p

Related

how to mimic histogram plot from flowjo in R using flowCore?

I'm new to flowCore + R. I would like to mimic a histogram plot after gating that can be manually done in FlowJo software. I got something similar but it doesn't look quite right because it is a "density" plot and is shifted. How can I get the x axis to shift over and look similar to how FlowJo outputs the plot? I tried reading this document but couldn't find a plot similar to the one in FlowJo: howtoflowcore Appreciate any guidance. Thanks.
code snippet:
library(flowCore)
parentpath <- "/parent/path"
subfolder <- "Sample 1"
fcs_files <- list.files(paste0(parentpath, subfolder), pattern = ".fcs")
fs <- read.flowSet(fcs_files)
rect.g <- rectangleGate(filterId = "main",list("FSC-A" = c(1e5, 2e5), "SSC-A" = c(3e4,1e5)))
fs_sub <- Subset(fs, rect.g)
p <- ggcyto(fs_sub[[15]], aes(x= `UV-379-A`)) +
geom_density(fill='black', alpha = 0.4) +
ggcyto_par_set(limits = list(x = c(-1e3, 5e4), y = c(0, 6e-5)))
p
FlowJo output:
R FlowCore output:
The reason that for the "shift" is that the x axis is logarithmic (base 10) in the flowJo graph. To achieve the same result in R, add
+ scale_x_log10()
after the existing code. This might interact weirdly with the axis limits you've set, so bare that in mind.
To make the y-axis "count" rather than density, you can change the first line of your ggcyto() call to:
aes(x= `UV-379-A`, y = after_stat(count))
Let me know if that works - I don't have your data to hand so that's all from memory!
For any purely aesthetic changes, they are relatively easy to look up.

Plot a table with box size changing

Does anyone have an idea how is this kind of chart plotted? It seems like heat map. However, instead of using color, size of each cell is used to indicate the magnitude. I want to plot a figure like this but I don't know how to realize it. Can this be done in R or Matlab?
Try scatter:
scatter(x,y,sz,c,'s','filled');
where x and y are the positions of each square, sz is the size (must be a vector of the same length as x and y), and c is a 3xlength(x) matrix with the color value for each entry. The labels for the plot can be input with set(gcf,properties) or xticklabels:
X=30;
Y=10;
[x,y]=meshgrid(1:X,1:Y);
x=reshape(x,[size(x,1)*size(x,2) 1]);
y=reshape(y,[size(y,1)*size(y,2) 1]);
sz=50;
sz=sz*(1+rand(size(x)));
c=[1*ones(length(x),1) repmat(rand(size(x)),[1 2])];
scatter(x,y,sz,c,'s','filled');
xlab={'ACC';'BLCA';etc}
xticks(1:X)
xticklabels(xlab)
set(get(gca,'XLabel'),'Rotation',90);
ylab={'RAPGEB6';etc}
yticks(1:Y)
yticklabels(ylab)
EDIT: yticks & co are only available for >R2016b, if you don't have a newer version you should use set instead:
set(gca,'XTick',1:X,'XTickLabel',xlab,'XTickLabelRotation',90) %rotation only available for >R2014b
set(gca,'YTick',1:Y,'YTickLabel',ylab)
in R, you should use ggplot2 that allows you to map your values (gene expression in your case?) onto the size variable. Here, I did a simulation that resembles your data structure:
my_data <- matrix(rnorm(8*26,mean=0,sd=1), nrow=8, ncol=26,
dimnames = list(paste0("gene",1:8), LETTERS))
Then, you can process the data frame to be ready for ggplot2 data visualization:
library(reshape)
dat_m <- melt(my_data, varnames = c("gene", "cancer"))
Now, use ggplot2::geom_tile() to map the values onto the size variable. You may update additional features of the plot.
library(ggplot2)
ggplot(data=dat_m, aes(cancer, gene)) +
geom_tile(aes(size=value, fill="red"), color="white") +
scale_fill_discrete(guide=FALSE) + ##hide scale
scale_size_continuous(guide=FALSE) ##hide another scale
In R, corrplotpackage can be used. Specifically, you have to use method = 'square' when creating the plot.
Try this as an example:
library(corrplot)
corrplot(cor(mtcars), method = 'square', col = 'red')

Cannot save plots as pdf when ggplot function is called inside a function

I am going to plot a boxplot from a 4-column matrix pl1 using ggplot with dots on each box. The instruction for plotting is like this:
p1 <- ggplot(pl1, aes(x=factor(Edge_n), y=get(make.names(y_label)), ymax=max(get(make.names(y_label)))*1.05))+
geom_boxplot(aes(fill=method), outlier.shape= NA)+
theme(text = element_text(size=20), aspect.ratio=1)+
xlab("Number of edges")+
ylab(y_label)+
scale_fill_manual(values=color_box)+
geom_point(aes(x=factor(Edge_n), y=get(make.names(true_des)), ymax=max(get(make.names(true_des)))*1.05, color=method),
position = position_dodge(width=0.75))+
scale_color_manual(values=color_pnt)
Then, I use print(p1) to print it on an opened pdf. However, this does not work for me and I get the below error:
Error in make.names(true_des) : object 'true_des' not found
Does anyone can help?
Your example is not very clear because you give a call but you don't show the values of your variables so it's really hard to figure out what you're trying to do (for instance, is method the name of a column in the data frame pl1, or is it a variable (and if it's a variable, what is its type? string? name?)).
Nonetheless, here's an example that should help set you on the way to doing what you want:
Try something like this:
pl1 <- data.frame(Edge_n = sample(5, 20, TRUE), foo = rnorm(20), bar = rnorm(20))
y_label <- 'foo'
ax <- do.call(aes, list(
x=quote(factor(Edge_n)),
y=as.name(y_label),
ymax = substitute(max(y)*1.05, list(y=as.name(y_label)))))
p1 <- ggplot(pl1) + geom_boxplot(ax)
print(p1)
This should get you started to figuring out the rest of what you're trying to do.
Alternately (a different interpretation of your question) is that you may be running into a problem with the environment in which aes evaluates its arguments. See https://github.com/hadley/ggplot2/issues/743 for details. If this is the issue, then the answer might to override the default value of the environment argument to aes, for instance: aes(x=factor(Edge_n), y=get(make.names(y_label)), ymax=max(get(make.names(y_label)))*1.05, environment=environment())

Weird ggplot2 error: Empty raster

Why does
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(1.5,1.5)),aes(x=x,y=y,color=z)) +
geom_point()
give me the error
Error in grid.Call.graphics(L_raster, x$raster, x$x, x$y, x$width, x$height, : Empty raster
but the following two plots work
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(2.5,2.5)),aes(x=x,y=y,color=z)) +
geom_point()
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(1.5,2.5)),aes(x=x,y=y,color=z)) +
geom_point()
I'm using ggplot2 0.9.3.1
TL;DR: Check your data -- do you really want to use a continuous color scale with only one possible value for the color?
The error does not occur if you add + scale_fill_continuous(guide=FALSE) to the plot. (This turns off the legend.)
ggplot(data.frame(x=c(1,2), y=c(1,2), z=c(1.5,1.5)), aes(x=x,y=y,color=z)) +
geom_point() + scale_color_continuous(guide = FALSE)
The error seems to be triggered in cases where a continuous color scale uses only one color. The current GitHub version already includes the relevant pull request. Install it via:
devtools::install_github("hadley/ggplot2")
But more probably there is an issue with the data: why would you use a continuous color scale with only one value?
The same behaviour (i.e. the "Empty raster"error) appeared to me with another value apart from 1.5.
Try the following:
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(0.02,0.02)),aes(x=x,y=y,color=z))
+ geom_point()
And you get again the same error (tried with both 0.9.3.1 and 1.0.0.0 versions) so it looks like a nasty and weird bug.
This definitely sounds like an edge case better suited for a bug report as others have mentioned but here's some generalizable code that might be useful to somebody as a clunky workaround or for handling labels/colors. It's plotting a rescaled variable and using the real values as labels.
require(scales)
z <- c(1.5,1.5)
# rescale z to 0:1
z_rescaled <- rescale(z)
# customizable number of breaks in the legend
max_breaks_cnt <- 5
# break z and z_rescaled by quantiles determined by number of maximum breaks
# and use 'unique' to remove duplicate breaks
breaks_z <- unique(as.vector(quantile(z, seq(0,1,by=1/max_breaks_cnt))))
breaks_z_rescaled <- unique(as.vector(quantile(z_rescaled, seq(0,1,by=1/max_breaks_cnt))))
# make a color palette
Pal <- colorRampPalette(c('yellow','orange','red'))(500)
# plot z_rescaled with breaks_z used as labels
ggplot(data.frame(x=c(1,2),y=c(1,2),z_rescaled),aes(x=x,y=y,color=z_rescaled)) +
geom_point() + scale_colour_gradientn("z",colours=Pal,labels = breaks_z,breaks=breaks_z_rescaled)
This is quite off-topic but I like to use rescaling to send tons of changing variables to a function like this:
colorfunction <- gradient_n_pal(colours = colorRampPalette(c('yellow','orange','red'))(500),
values = c(0:1), space = "Lab")
colorfunction(z_rescaled)

How to plot a violin scatter boxplot (in R)?

I just came by the following plot:
And wondered how can it be done in R? (or other softwares)
Update 10.03.11: Thank you everyone who participated in answering this question - you gave wonderful solutions! I've compiled all the solution presented here (as well as some others I've came by online) in a post on my blog.
Make.Funny.Plot does more or less what I think it should do. To be adapted according to your own needs, and might be optimized a bit, but this should be a nice start.
Make.Funny.Plot <- function(x){
unique.vals <- length(unique(x))
N <- length(x)
N.val <- min(N/20,unique.vals)
if(unique.vals>N.val){
x <- ave(x,cut(x,N.val),FUN=min)
x <- signif(x,4)
}
# construct the outline of the plot
outline <- as.vector(table(x))
outline <- outline/max(outline)
# determine some correction to make the V shape,
# based on the range
y.corr <- diff(range(x))*0.05
# Get the unique values
yval <- sort(unique(x))
plot(c(-1,1),c(min(yval),max(yval)),
type="n",xaxt="n",xlab="")
for(i in 1:length(yval)){
n <- sum(x==yval[i])
x.plot <- seq(-outline[i],outline[i],length=n)
y.plot <- yval[i]+abs(x.plot)*y.corr
points(x.plot,y.plot,pch=19,cex=0.5)
}
}
N <- 500
x <- rpois(N,4)+abs(rnorm(N))
Make.Funny.Plot(x)
EDIT : corrected so it always works.
I recently came upon the beeswarm package, that bears some similarity.
The bee swarm plot is a
one-dimensional scatter plot like
"stripchart", but with closely-packed,
non-overlapping points.
Here's an example:
library(beeswarm)
beeswarm(time_survival ~ event_survival, data = breast,
method = 'smile',
pch = 16, pwcol = as.numeric(ER),
xlab = '', ylab = 'Follow-up time (months)',
labels = c('Censored', 'Metastasis'))
legend('topright', legend = levels(breast$ER),
title = 'ER', pch = 16, col = 1:2)
(source: eklund at www.cbs.dtu.dk)
I have come up with the code similar to Joris, still I think this is more than a stem plot; here I mean that they y value in each series is a absolute value of a distance to the in-bin mean, and x value is more about whether the value is lower or higher than mean.
Example code (sometimes throws warnings but works):
px<-function(x,N=40,...){
x<-sort(x);
#Cutting in bins
cut(x,N)->p;
#Calculate the means over bins
sapply(levels(p),function(i) mean(x[p==i]))->meansl;
means<-meansl[p];
#Calculate the mins over bins
sapply(levels(p),function(i) min(x[p==i]))->minl;
mins<-minl[p];
#Each dot is one value.
#X is an order of a value inside bin, moved so that the values lower than bin mean go below 0
X<-rep(0,length(x));
for(e in levels(p)) X[p==e]<-(1:sum(p==e))-1-sum((x-means)[p==e]<0);
#Y is a bin minum + absolute value of a difference between value and its bin mean
plot(X,mins+abs(x-means),pch=19,cex=0.5,...);
}
Try the vioplot package:
library(vioplot)
vioplot(rnorm(100))
(with awful default color ;-)
There is also wvioplot() in the wvioplot package, for weighted violin plot, and beanplot, which combines violin and rug plots. They are also available through the lattice package, see ?panel.violin.
Since this hasn't been mentioned yet, there is also ggbeeswarm as a relatively new R package based on ggplot2.
Which adds another geom to ggplot to be used instead of geom_jitter or the like.
In particular geom_quasirandom (see second example below) produces really good results and I have in fact adapted it as default plot.
Noteworthy is also the package vipor (VIolin POints in R) which produces plots using the standard R graphics and is in fact also used by ggbeeswarm behind the scenes.
set.seed(12345)
install.packages('ggbeeswarm')
library(ggplot2)
library(ggbeeswarm)
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

Resources