how to mimic histogram plot from flowjo in R using flowCore? - r

I'm new to flowCore + R. I would like to mimic a histogram plot after gating that can be manually done in FlowJo software. I got something similar but it doesn't look quite right because it is a "density" plot and is shifted. How can I get the x axis to shift over and look similar to how FlowJo outputs the plot? I tried reading this document but couldn't find a plot similar to the one in FlowJo: howtoflowcore Appreciate any guidance. Thanks.
code snippet:
library(flowCore)
parentpath <- "/parent/path"
subfolder <- "Sample 1"
fcs_files <- list.files(paste0(parentpath, subfolder), pattern = ".fcs")
fs <- read.flowSet(fcs_files)
rect.g <- rectangleGate(filterId = "main",list("FSC-A" = c(1e5, 2e5), "SSC-A" = c(3e4,1e5)))
fs_sub <- Subset(fs, rect.g)
p <- ggcyto(fs_sub[[15]], aes(x= `UV-379-A`)) +
geom_density(fill='black', alpha = 0.4) +
ggcyto_par_set(limits = list(x = c(-1e3, 5e4), y = c(0, 6e-5)))
p
FlowJo output:
R FlowCore output:

The reason that for the "shift" is that the x axis is logarithmic (base 10) in the flowJo graph. To achieve the same result in R, add
+ scale_x_log10()
after the existing code. This might interact weirdly with the axis limits you've set, so bare that in mind.
To make the y-axis "count" rather than density, you can change the first line of your ggcyto() call to:
aes(x= `UV-379-A`, y = after_stat(count))
Let me know if that works - I don't have your data to hand so that's all from memory!
For any purely aesthetic changes, they are relatively easy to look up.

Related

Why aren't any points showing up in the qqcomp function when using plotstyle="ggplot"?

I want to compare the fit of different distributions to my data in a single plot. The qqcomp function from the fitdistrplus package pretty much does exactly what I want to do. The only problem I have however, is that it's mostly written using base R plot and all my other plots are written in ggplot2. I basically just want to customize the qqcomp plots to look like they have been made in ggplot2.
From the documentation (https://www.rdocumentation.org/packages/fitdistrplus/versions/1.0-14/topics/graphcomp) I get that this is totally possible by setting plotstyle="ggplot". If I do this however, no points are showing up on the plot, even though it worked perfectly without the plotstyle argument. Here is a little example to visualize my problem:
library(fitdistrplus)
library(ggplot2)
set.seed(42)
vec <- rgamma(100, shape=2)
fit.norm <- fitdist(vec, "norm")
fit.gamma <- fitdist(vec, "gamma")
fit.weibull <- fitdist(vec, "weibull")
model.list <- list(fit.norm, fit.gamma, fit.weibull)
qqcomp(model.list)
This gives the following output:
While this:
qqcomp(model.list, plotstyle="ggplot")
gives the following output:
Why are the points not showing up? Am I doing something wrong here or is this a bug?
EDIT:
So I haven't figured out why this doesn't work, but there is a pretty easy workaround. The function call qqcomp(model.list, plotstyle="ggplot") still returns an ggplot object, which includes the data used to make the plot. Using that data one can easily write an own plot function that does exactly what one wants. It's not very elegant, but until someone finds out why it's not working as expected I will just use this method.
I was able to reproduce your error and indeed, it's really intriguing. Maybe, you should contact developpers of this package to mention this bug.
Otherwise, if you want to reproduce this qqplot using ggplot and stat_qq, passing the corresponding distribution function and the parameters associated (stored in $estimate):
library(ggplot2)
df = data.frame(vec)
ggplot(df, aes(sample = vec))+
stat_qq(distribution = qgamma, dparams = as.list(fit.gamma$estimate), color = "green")+
stat_qq(distribution = qnorm, dparams = as.list(fit.norm$estimate), color = "red")+
stat_qq(distribution = qweibull, dparams = as.list(fit.weibull$estimate), color = "blue")+
geom_abline(slope = 1, color = "black")+
labs(title = "Q-Q Plots", x = "Theoritical quantiles", y = "Empirical quantiles")
Hope it will help you.

Plot a table with box size changing

Does anyone have an idea how is this kind of chart plotted? It seems like heat map. However, instead of using color, size of each cell is used to indicate the magnitude. I want to plot a figure like this but I don't know how to realize it. Can this be done in R or Matlab?
Try scatter:
scatter(x,y,sz,c,'s','filled');
where x and y are the positions of each square, sz is the size (must be a vector of the same length as x and y), and c is a 3xlength(x) matrix with the color value for each entry. The labels for the plot can be input with set(gcf,properties) or xticklabels:
X=30;
Y=10;
[x,y]=meshgrid(1:X,1:Y);
x=reshape(x,[size(x,1)*size(x,2) 1]);
y=reshape(y,[size(y,1)*size(y,2) 1]);
sz=50;
sz=sz*(1+rand(size(x)));
c=[1*ones(length(x),1) repmat(rand(size(x)),[1 2])];
scatter(x,y,sz,c,'s','filled');
xlab={'ACC';'BLCA';etc}
xticks(1:X)
xticklabels(xlab)
set(get(gca,'XLabel'),'Rotation',90);
ylab={'RAPGEB6';etc}
yticks(1:Y)
yticklabels(ylab)
EDIT: yticks & co are only available for >R2016b, if you don't have a newer version you should use set instead:
set(gca,'XTick',1:X,'XTickLabel',xlab,'XTickLabelRotation',90) %rotation only available for >R2014b
set(gca,'YTick',1:Y,'YTickLabel',ylab)
in R, you should use ggplot2 that allows you to map your values (gene expression in your case?) onto the size variable. Here, I did a simulation that resembles your data structure:
my_data <- matrix(rnorm(8*26,mean=0,sd=1), nrow=8, ncol=26,
dimnames = list(paste0("gene",1:8), LETTERS))
Then, you can process the data frame to be ready for ggplot2 data visualization:
library(reshape)
dat_m <- melt(my_data, varnames = c("gene", "cancer"))
Now, use ggplot2::geom_tile() to map the values onto the size variable. You may update additional features of the plot.
library(ggplot2)
ggplot(data=dat_m, aes(cancer, gene)) +
geom_tile(aes(size=value, fill="red"), color="white") +
scale_fill_discrete(guide=FALSE) + ##hide scale
scale_size_continuous(guide=FALSE) ##hide another scale
In R, corrplotpackage can be used. Specifically, you have to use method = 'square' when creating the plot.
Try this as an example:
library(corrplot)
corrplot(cor(mtcars), method = 'square', col = 'red')

Setting equal xlim and ylim in plot function

Is there a way to get the plot function to generate equal xlimand ylimautomatically?
I do not want to define a fix range beforehand, but I want the plot function to decide about the range itself. However, I expect it to pick the same range for x and y.
A possible solution is to define a wrapper to the plot function:
plot.Custom <- function(x, y, ...) {
.limits <- range(x, y)
plot(x, y, xlim = .limits, ylim = .limits, ...)
}
One way is to manipulate interactively and then choose the right one. A slider will appear once you run the following code.
library(manipulate)
manipulate(
plot(cars, xlim=c(x.min,x.max)),
x.min=slider(0,15),
x.max=slider(15,30))
I'm not aware of anyway to do this using plot(doesn't mean there isn't one). ggplot might be the way to go; it lends itself more to be being retroactively changed since it is designed around a layer system.
library(ggplot2)
#Creating our ggplot object
loop_plot <- ggplot(cars, aes(x = speed, y = dist)) +
geom_point()
#pulling out the 'auto' x & y axis limits
rangepull <- t(cbind(
ggplot_build(loop_plot)$panel$ranges[[1]]$x.range,
ggplot_build(loop_plot)$panel$ranges[[1]]$y.range))
#taking the max and min(so we don't cut out data points)
newrange <- list(cor.min = min(rangepull[,1]), cor.max = max(rangepull[,2]))
#changing our plot size to be nice and symmetric
loop_plot <- loop_plot +
xlim(newrange$cor.min, newrange$cor.max) +
ylim(newrange$cor.min, newrange$cor.max)
Note that the loop_plot object is of ggplot class, and wont actually print until its called.
I used the cars dataset in the code above to show whats going on, but just sub in your data set[s] and then do whatever postmortem your end goal is.
You'll also be able to add in titles and the like based off of the dataset name et cetera which will likely end up producing a clearer visualization out of your loop.
Hopefully this works for your needs.

How can I overlay two dense scatter plots so that I can see the outlines of each in R or Matlab?

See this example
This was created in matlab by making two scatter plots independently, creating images of each, then using the imagesc to draw them into the same figure and then finally setting the alpha of the top image to 0.5.
I would like to do this in R or matlab without using images, since creating an image does not preserve the axis scale information, nor can I overlay a grid (e.g. using 'grid on' in matlab). Ideally I wold like to do this properly in matlab, but would also be happy with a solution in R. It seems like it should be possible but I can't for the life of me figure it out.
So generally, I would like to be able to set the alpha of an entire plotted object (i.e. of a matlab plot handle in matlab parlance...)
Thanks,
Ben.
EDIT: The data in the above example is actually 2D. The plotted points are from a computer simulation. Each point represents 'amplitude' (y-axis) (an emergent property specific to the simulation I'm running), plotted against 'performance' (x-axis).
EDIT 2: There are 1796400 points in each data set.
Using ggplot2 you can add together two geom_point's and make them transparent using the alpha parameter. ggplot2 als adds up transparency, and I think this is what you want. This should work, although I haven't run this.
dat = data.frame(x = runif(1000), y = runif(1000), cat = rep(c("A","B"), each = 500))
ggplot(aes(x = x, y = y, color = cat), data = dat) + geom_point(alpha = 0.3)
ggplot2 is awesome!
This is an example of calculating and drawing a convex hull:
library(automap)
library(ggplot2)
library(plyr)
loadMeuse()
theme_set(theme_bw())
meuse = as.data.frame(meuse)
chull_per_soil = ddply(meuse, .(soil),
function(sub) sub[chull(sub$x, sub$y),c("x","y")])
ggplot(aes(x = x, y = y), data = meuse) +
geom_point(aes(size = log(zinc), color = ffreq)) +
geom_polygon(aes(color = soil), data = chull_per_soil, fill = NA) +
coord_equal()
which leads to the following illustration:
You could first export the two data sets as bitmap images, re-import them, add transparency:
library(grid)
N <- 1e7 # Warning: slow
d <- data.frame(x1=rnorm(N),
x2=rnorm(N, 0.8, 0.9),
y=rnorm(N, 0.8, 0.2),
z=rnorm(N, 0.2, 0.4))
v <- with(d, dataViewport(c(x1,x2),c(y, z)))
png("layer1.png", bg="transparent")
with(d, grid.points(x1,y, vp=v,default="native",pch=".",gp=gpar(col="blue")))
dev.off()
png("layer2.png", bg="transparent")
with(d, grid.points(x2,z, vp=v,default="native",pch=".",gp=gpar(col="red")))
dev.off()
library(png)
i1 <- readPNG("layer1.png", native=FALSE)
i2 <- readPNG("layer2.png", native=FALSE)
ghostize <- function(r, alpha=0.5)
matrix(adjustcolor(rgb(r[,,1],r[,,2],r[,,3],r[,,4]), alpha.f=alpha), nrow=dim(r)[1])
grid.newpage()
grid.rect(gp=gpar(fill="white"))
grid.raster(ghostize(i1))
grid.raster(ghostize(i2))
you can add these as layers in, say, ggplot2.
Use the transparency capability of color descriptions. You can define a color as a sequence of four 2-byte words: muddy <- "#888888FF" . The first three pairs set the RGB colors (00 to FF); the final pair sets the transparency level.
AFAIK, your best option with Matlab is to just make your own plot function. The scatter plot points unfortunately do not yet have a transparency attribute so you cannot affect it. However, if you create, say, most crudely, a bunch of loops which draw many tiny circles, you can then easily give them an alpha value and obtain a transparent set of data points.

How to plot a violin scatter boxplot (in R)?

I just came by the following plot:
And wondered how can it be done in R? (or other softwares)
Update 10.03.11: Thank you everyone who participated in answering this question - you gave wonderful solutions! I've compiled all the solution presented here (as well as some others I've came by online) in a post on my blog.
Make.Funny.Plot does more or less what I think it should do. To be adapted according to your own needs, and might be optimized a bit, but this should be a nice start.
Make.Funny.Plot <- function(x){
unique.vals <- length(unique(x))
N <- length(x)
N.val <- min(N/20,unique.vals)
if(unique.vals>N.val){
x <- ave(x,cut(x,N.val),FUN=min)
x <- signif(x,4)
}
# construct the outline of the plot
outline <- as.vector(table(x))
outline <- outline/max(outline)
# determine some correction to make the V shape,
# based on the range
y.corr <- diff(range(x))*0.05
# Get the unique values
yval <- sort(unique(x))
plot(c(-1,1),c(min(yval),max(yval)),
type="n",xaxt="n",xlab="")
for(i in 1:length(yval)){
n <- sum(x==yval[i])
x.plot <- seq(-outline[i],outline[i],length=n)
y.plot <- yval[i]+abs(x.plot)*y.corr
points(x.plot,y.plot,pch=19,cex=0.5)
}
}
N <- 500
x <- rpois(N,4)+abs(rnorm(N))
Make.Funny.Plot(x)
EDIT : corrected so it always works.
I recently came upon the beeswarm package, that bears some similarity.
The bee swarm plot is a
one-dimensional scatter plot like
"stripchart", but with closely-packed,
non-overlapping points.
Here's an example:
library(beeswarm)
beeswarm(time_survival ~ event_survival, data = breast,
method = 'smile',
pch = 16, pwcol = as.numeric(ER),
xlab = '', ylab = 'Follow-up time (months)',
labels = c('Censored', 'Metastasis'))
legend('topright', legend = levels(breast$ER),
title = 'ER', pch = 16, col = 1:2)
(source: eklund at www.cbs.dtu.dk)
I have come up with the code similar to Joris, still I think this is more than a stem plot; here I mean that they y value in each series is a absolute value of a distance to the in-bin mean, and x value is more about whether the value is lower or higher than mean.
Example code (sometimes throws warnings but works):
px<-function(x,N=40,...){
x<-sort(x);
#Cutting in bins
cut(x,N)->p;
#Calculate the means over bins
sapply(levels(p),function(i) mean(x[p==i]))->meansl;
means<-meansl[p];
#Calculate the mins over bins
sapply(levels(p),function(i) min(x[p==i]))->minl;
mins<-minl[p];
#Each dot is one value.
#X is an order of a value inside bin, moved so that the values lower than bin mean go below 0
X<-rep(0,length(x));
for(e in levels(p)) X[p==e]<-(1:sum(p==e))-1-sum((x-means)[p==e]<0);
#Y is a bin minum + absolute value of a difference between value and its bin mean
plot(X,mins+abs(x-means),pch=19,cex=0.5,...);
}
Try the vioplot package:
library(vioplot)
vioplot(rnorm(100))
(with awful default color ;-)
There is also wvioplot() in the wvioplot package, for weighted violin plot, and beanplot, which combines violin and rug plots. They are also available through the lattice package, see ?panel.violin.
Since this hasn't been mentioned yet, there is also ggbeeswarm as a relatively new R package based on ggplot2.
Which adds another geom to ggplot to be used instead of geom_jitter or the like.
In particular geom_quasirandom (see second example below) produces really good results and I have in fact adapted it as default plot.
Noteworthy is also the package vipor (VIolin POints in R) which produces plots using the standard R graphics and is in fact also used by ggbeeswarm behind the scenes.
set.seed(12345)
install.packages('ggbeeswarm')
library(ggplot2)
library(ggbeeswarm)
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

Resources