Overlapping stacked density plots - r

I'm trying to achieve a similar plot to this one, using R's native plot command.
I was able to get something similar with the code below, however, I'd like the density polygons to overlap. Can anyone suggest a way to do this?
data = lapply(1:5, function(x) density(rnorm(100, mean = x)))
par(mfrow=c(5,1))
for(i in 1:length(data)){
plot(data[[i]], xaxt='n', yaxt='n', main='', xlim=c(-2, 8), xlab='', ylab='', bty='n', lwd=1)
polygon(data[[i]], col=rgb(0,0,0,.4), border=NA)
abline(h=0, lwd=0.5)
}
Outputs:

I would do it something like the following. I plot the densities in the same plot but add an integer to the y values. To make them overlapping i multiply by a constant factor fac.
# Create your toy data
data <- lapply(1:5, function(x) density(rnorm(100, mean = x)))
fac <- 5 # A factor to make the densities overlap
# We make a empty plot
plot(1, type = "n", xlim = c(-3, 10), ylim = c(1, length(data) + 2),
axes = FALSE, xlab = "", ylab = "")
# Add each density, shifted by i and scaled by fac
for(i in 1:length(data)){
lines( data[[i]]$x, fac*data[[i]]$y + i)
polygon(data[[i]]$x, fac*data[[i]]$y + i, col = rgb(0, 0, 0, 0.4), border = NA)
abline(h = i, lwd = 0.5)
}

(Note: This content was previously edited into the Question and was written by #by0.)
Thanks to #AEBilgrau, I quickly put together this function which works really nicely. Note: you need to play around with the factor fac depending on your data.
stacked.density <- function(data, fac = 3, xlim, col = 'black',
alpha = 0.4, show.xaxis = T,
xlab = '', ylab = ''){
xvals = unlist(lapply(data, function(d) d$x))
if(missing(xlim)) xlim=c(min(xvals), max(xvals))
col = sapply(col, col2alpha, alpha)
if(length(col) == 1) col = rep(col, length(data))
plot(1, type = "n", xlim = xlim, ylim = c(1,length(data) + 2),
yaxt='n', bty='n', xaxt=ifelse(show.xaxis, 'l', 'n'), xlab = xlab, ylab = ylab)
z = length(data):1
for(i in 1:length(data)){
d = data[[ z[i] ]]
lines(d$x, fac*d$y + i, lwd=1)
polygon(d$x, fac*d$y+ i, col=col[i], border=NA)
abline(h = i, lwd=0.5)
}
}
data <- lapply(1:5, function(x) density(rnorm(100, mean = x)))
stacked.density(data, col=c('red', 'purple', 'blue', 'green', 'yellow'), alpha=0.3, show.xaxis=T)
outputs:

Related

Draw discrete CDF in R

I want to draw a CDF in R, but I am having some problems. I want it to look like this:
But I get lines between the open and closed points by using the command plot(x,y,type="s")
So how do I get rid of those lines?
This isn't a general purpose example, but it will show you how to build the plot you desire in a couple of steps.
First, let's create some data (notice the zeros at the beginning):
x <- 0:6
fx <- c(0, 0.19, 0.21, 0.4, 0.12, 0.05, 0.03)
Fx <- cumsum(fx)
n <- length(x)
Then let's make an empty plot
plot(x = NA, y = NA, pch = NA,
xlim = c(0, max(x)),
ylim = c(0, 1),
xlab = "X label",
ylab = "Y label",
main = "Title")
Add closed circles, open circles, and finally the horizontal lines
points(x = x[-n], y = Fx[-1], pch=19)
points(x = x[-1], y = Fx[-1], pch=1)
for(i in 1:(n-1)) points(x=x[i+0:1], y=Fx[c(i,i)+1], type="l")
Viola!
If you insist on not seeing the line "inside" of the white points, do this instead:
points(x = x[-n], y = Fx[-1], pch=19)
for(i in 1:(n-1)) points(x=x[i+0:1], y=Fx[c(i,i)+1], type="l")
points(x = x[-1], y = Fx[-1], pch=19, col="white")
points(x = x[-1], y = Fx[-1], pch=1)
You can construct this plot using:
plot(x, y, pch = 16, ylim = c(-0.03, 1.03), ylab = "CDF") # solid points/graphic settings
points(x[-1], y[-length(y)]) # open points
abline(h = c(0, 1), col = "grey", lty = 2) # horizontal lines
Note: plot(x,y, type = "s) does not produce a plot like your original question, but rather a step function with both treads (horizontal lines) and risers (vertical lines):
Data
library(dplyr)
set.seed(1)
df <- data.frame(x = rpois(30, 3)) %>%
dplyr::arrange(x) %>%
dplyr::add_count(x) %>%
dplyr::distinct(x, .keep_all = T) %>%
mutate(y = cumsum(n) / sum(n))
x <- df$x
y <- df$y

How to adjust legend position of interaction.plot and lineplot.CI?

I'm a beginner in coding. I was trying to create an interaction plot. Here's my code:
data is clinicaltrials from the data of the book "Learning Statistics with R."
library(sciplot)
library(lsr)
library(gplots)
lineplot.CI(x.factor = clin.trial$drug,
response = clin.trial$mood.gain,
group = clin.trial$therapy,
ci.fun = ciMean,
xlab = "Drug",
ylab = "Mood Gain")
and it produces the graph like this:
As can be seen in the graph, the legend box is not within my screen.
Also I tried creating another plot using the following code:
interaction.plot(x.factor = clin.trial$drug,
trace.factor = clin.trial$therapy,
response = clin.trial$mood.gain,
fun = mean,
type = "l",
lty = 1, # line type
lwd = 2, # line width
legend = T,
xlab = "Drug", ylab = "Mood Gain",
col = c("#00AFBB", "#E7B800"),
xpd = F,
trace.label = "Therapy")
For this code, I got the graph like this:
In this graph, the legend does not have labels.
Could anyone help me with these problems regarding legend?
You probably plan to save the plot via RStudio GUI. When you resize the plot window with your mouse, you need to run the code again to refresh the legend dimensions.
However, it's advantageous to use a more sophisticated method, e.g. to save it as a png with fixed dimensions like so:
library("sciplot")
library("lsr")
library("gplots")
png("Plot_1.png", height=400, width=500)
lineplot.CI(x.factor=clin.trial$drug,
response=clin.trial$mood.gain,
group=clin.trial$therapy,
ci.fun=ciMean,
xlab="Drug",
ylab="Mood Gain"
)
dev.off()
png("Plot_2.png", height=400, width=500)
interaction.plot(x.factor=clin.trial$drug,
trace.factor=clin.trial$therapy,
response=clin.trial$mood.gain,
fun=mean,
type="l",
lty=1, # line type
lwd=2, # line width
legend=T,
xlab="Drug", ylab="Mood Gain",
col=c("#00AFBB", "#E7B800"),
xpd=F,
trace.label="Therapy")
dev.off()
The plots are saved into your working directory, check getwd() .
Edit
You could also adjust the legend position.
In lineplot.CI you may use arguments; either by using characters just for x, e.g. x.leg="topleft" or both coordinates as numeric x.leg=.8, y.leg=2.2.
interaction.plot does not provide yet this functionality. I provide a hacked version below. Arguments are called xleg and yleg, functionality as above.
See ?legend for further explanations.
interaction.plot <- function (x.factor, trace.factor, response, fun = mean,
type = c("l", "p", "b", "o", "c"), legend = TRUE,
trace.label = deparse(substitute(trace.factor)),
fixed = FALSE, xlab = deparse(substitute(x.factor)),
ylab = ylabel, ylim = range(cells, na.rm = TRUE),
lty = nc:1, col = 1, pch = c(1L:9, 0, letters),
xpd = NULL, leg.bg = par("bg"), leg.bty = "n",
xtick = FALSE, xaxt = par("xaxt"), axes = TRUE,
xleg=NULL, yleg=NULL, ...) {
ylabel <- paste(deparse(substitute(fun)), "of ", deparse(substitute(response)))
type <- match.arg(type)
cells <- tapply(response, list(x.factor, trace.factor), fun)
nr <- nrow(cells)
nc <- ncol(cells)
xvals <- 1L:nr
if (is.ordered(x.factor)) {
wn <- getOption("warn")
options(warn = -1)
xnm <- as.numeric(levels(x.factor))
options(warn = wn)
if (!anyNA(xnm))
xvals <- xnm
}
xlabs <- rownames(cells)
ylabs <- colnames(cells)
nch <- max(sapply(ylabs, nchar, type = "width"))
if (is.null(xlabs))
xlabs <- as.character(xvals)
if (is.null(ylabs))
ylabs <- as.character(1L:nc)
xlim <- range(xvals)
if (is.null(xleg)) {
xleg <- xlim[2L] + 0.05 * diff(xlim)
xlim <- xlim + c(-0.2/nr, if (legend) 0.2 + 0.02 * nch else 0.2/nr) *
diff(xlim)
}
dev.hold()
on.exit(dev.flush())
matplot(xvals, cells, ..., type = type, xlim = xlim, ylim = ylim,
xlab = xlab, ylab = ylab, axes = axes, xaxt = "n",
col = col, lty = lty, pch = pch)
if (axes && xaxt != "n") {
axisInt <- function(x, main, sub, lwd, bg, log, asp,
...) axis(1, x, ...)
mgp. <- par("mgp")
if (!xtick)
mgp.[2L] <- 0
axisInt(1, at = xvals, labels = xlabs, tick = xtick,
mgp = mgp., xaxt = xaxt, ...)
}
if (legend) {
yrng <- diff(ylim)
if (is.null(yleg))
yleg <- ylim[2L] - 0.1 * yrng
if (!is.null(xpd) || {
xpd. <- par("xpd")
!is.na(xpd.) && !xpd. && (xpd <- TRUE)
}) {
op <- par(xpd = xpd)
on.exit(par(op), add = TRUE)
}
# text(xleg, ylim[2L] - 0.05 * yrng, paste(" ",
# trace.label), adj = 0)
if (!fixed) {
ord <- sort.list(cells[nr, ], decreasing = TRUE)
ylabs <- ylabs[ord]
lty <- lty[1 + (ord - 1)%%length(lty)]
col <- col[1 + (ord - 1)%%length(col)]
pch <- pch[ord]
}
legend(xleg, yleg, legend = ylabs, col = col,
title = if (trace.label == "") NULL else trace.label,
pch = if (type %in% c("p", "b"))
pch, lty = if (type %in% c("l", "b"))
lty, bty = leg.bty, bg = leg.bg)
}
invisible()
}
Data:
lk <- "https://learningstatisticswithr.com/data.zip"
tmp <- tempfile()
tmp.dir <- tempdir()
download.file(lk, tmp)
unzip(tmp, exdir=tmp.dir)
load("data/clinicaltrial.Rdata")

How to keep equal relative sizes of points in a graph?

I want to make a graph where the size of the circles indicate the size of the sample. if i use plot in p1(), it works fine.
but if i try to have the different type of points colored, then the relative size is wrong.
How would I get both the red and green circles to be the same size?
p1<-function() {
plot(t$x,t$y,cex=100*t$size,xlim=c(0,1),ylim=c(0.,1.))
}
p2<-function() {
plot(t$x[t$r=="0"],t$y[t$r=="0"],xlim=c(0,1),ylim=c(0.,1.),cex=100*t$size,col="red")
points(t$x[t$r=="1"],t$y[t$r=="1"],xlim=c(0,1),ylim=c(0.,1.),cex=100*t$size,col="green")
}
l<-20
x<-seq(0,1,1/l)
y<-sqrt(x)
r=round(runif(n=length(x),min=0,max=.8))
n<-1:length(x)
size=n/sum(n)
t<-data.frame(x,y,r,n,size)
t$r<-factor(r)
str(t)
p1()
You have to change function p2 a bit. You are using t$size, all of it, when you should be subsetting by the factor t$r, since you are doing so when plotting the points.
If you plot t$x[t$r == "0"] versus t$y[t$r == "0"] then you must use the sizes corresponding to those points, which are t$size[t$r == "0"]. Alternatively, you could subset the data frame t first, and then use those two resulting data frames to plot the points. See function p2_alt at the end.
p2 <- function() {
plot(t$x[t$r == "0"], t$y[t$r == "0"],
xlim = c(0, 1), ylim = c(0., 1.),
cex = 100*t$size[t$r == "0"],
col = "red",
xlab = "x", ylab = "y")
points(t$x[t$r == "1"],
t$y[t$r == "1"],
xlim = c(0, 1), ylim = c(0., 1.),
cex = 100*t$size[t$r == "1"],
col = "green")
}
set.seed(651) # make the results reproducible
l <- 20
x <- seq(0, 1, 1/l)
y <- sqrt(x)
r <- round(runif(n = length(x), min = 0, max = 0.8))
n <- 1:length(x)
size <- n/sum(n)
t <- data.frame(x, y, r, n, size)
t$r <- factor(r)
#str(t)
#p1()
p2()
p2_alt <- function() {
df1 <- subset(t, r == "0")
df2 <- subset(t, r == "1")
plot(df1$x, df1$y,
xlim = c(0, 1), ylim = c(0., 1.),
cex = 100*df1$size,
col = "red",
xlab = "x", ylab = "y")
points(df2$x,
df2$y,
xlim = c(0, 1), ylim = c(0., 1.),
cex = 100*df2$size,
col = "green")
}
p2_alt()
The graph is exactly the same, but maybe the code is more readable.
Finally, note that I have added arguments xlab and ylab to both p2() and p2_alt().

R base graphics: overlapping axis tick labels from different plots with layout

I'm making stacked boxplots and plots on top of one another using R's layout command in base graphics.
The graphs look great, except that y-axis labels from different plots overlap (highlighted in red circle):
Similar questions are here online, but none of them use layout.
I don't want to expand the space between the plots, I don't think this will look good.
How
I've tried reducing the font size, but the labels still go outside the plot area.
How can I set R's boxplot and plot so that the tick labels do not go above or below the red line, i.e. the max/min y-value, in the image above?
Some example code
legend_space <- -0.26
right <- 10.5
bottom <- 0
left <- 5
top <- 0
cex_main = 1
setEPS()
postscript('figure.eps')
g1 <- c()
g2 <- c()
p <- c()
percent <- c()
sum_p <- c()
sum_percent <- c()
g1_means <- c()
g2_means <- c()
xaxis <- c()
sum_p[1] <- 0.0430904
xaxis[1] <- 2984116
p[1] <- 0.0430904
percent[1] <- -65.1758
sum_percent[1] <- -65.1758
g1[1] <- list(c(47.058824,100.000000,100.000000))
g1_means[1] <- 84.482759
g2[1] <- list(c(13.750000,4.123711,96.000000))
g2_means[1] <- 19.306931
sum_p[2] <- 0.0443229
xaxis[2] <- 2984148
p[2] <- 0.0587825
percent[2] <- -73.8956
sum_percent[2] <- -69.332
g1[2] <- list(c(94.285714,94.736842,100.000000))
g1_means[2] <- 95.145631
g2[2] <- list(c(10.588235,0.000000,92.592593))
g2_means[2] <- 21.250000
sum_p[3] <- 0.0444647
xaxis[3] <- 2984157
p[3] <- 0.124606
percent[3] <- -40.2577
sum_percent[3] <- -60.3056
g1[3] <- list(c(76.315789,94.736842,64.705882))
g1_means[3] <- 83.928571
g2[3] <- list(c(63.529412,0.000000,60.000000))
g2_means[3] <- 43.670886
sum_p[4] <- 0.0393696
xaxis[4] <- 2984168
p[4] <- 0.0310268
percent[4] <- -38.4133
sum_percent[4] <- -54.7893
g1[4] <- list(c(59.459459,57.894737,100.000000))
g1_means[4] <- 64.864865
g2[4] <- list(c(35.294118,6.250000,36.363636))
g2_means[4] <- 26.451613
sum_p[5] <- 0.0304293
xaxis[5] <- 2984175
p[5] <- 0.0344261
percent[5] <- -50.5157
sum_percent[5] <- -54.0582
g1[5] <- list(c(62.500000,94.736842,100.000000))
g1_means[5] <- 85.849057
g2[5] <- list(c(52.873563,6.250000,26.666667))
g2_means[5] <- 35.333333
layout(matrix(c(0,0,1,1,2,2,3,3,4,4), nrow = 5, byrow = TRUE), heights = c(0.2,1,1,1,1.4))
par(mar = c(bottom, left, top, right))
boxplot(g1, xaxt = 'n', range = 0, ylab = '%', ylim = c(0,100), col = 'white', cex.lab=1.5, cex.axis=cex_main, cex.main=cex_main, cex.sub=1.5, xlim = c(0.5,length(g1)+0.5))
title('title', outer = TRUE, line = -1.5)
lines(g1_means, col='dark green', lwd = 3)
par(xpd=TRUE)
legend('topright',inset = c(legend_space,0), c('Control', 'Weighted Mean'), col = c('black','dark green'), lwd = c(1,3))
boxplot(g2, xaxt = 'n', range = 0, main = NULL, ylim = c(0,100), ylab = '%', col = 'gray', cex.lab=1.5, cex.axis=cex_main, cex.main=cex_main, cex.sub=1.5, xlim = c(0.5,length(g2)+0.5))
lines(g2_means, col='dark green', lwd = 3)
par(xpd=TRUE)
legend('topright',inset = c(legend_space,0),c('Case', 'Weighted Mean'), col = c('black','dark green'), lwd = c(1,3))
par(mar = c(bottom, left, top, right))#'mar’ A numerical vector of the form 'c(bottom, left, top, right)’
plot(p, xaxt='n', ylab = 'P', type = 'l', lty = 1, lwd = 3, cex.lab=1.5, cex.axis=1, cex.main=cex_main, cex.sub=1.5, , log = 'y', xlim = c(0.5,length(p)+0.5), ylim = c(min(p,sum_p), max(p, sum_p)))
points(sum_p, xaxt='n', ylab = 'P', type = 'l', col = 'blue', lty = 2, lwd = 3)
legend('topright', inset=c(legend_space,0), c('CpG P', 'Moving P Mean'), col = c('black','blue'), lwd=c(3,3), lty=c(1,2))
par(mar = c(bottom+5, left, top, right))#'mar’ A numerical vector of the form 'c(bottom, left, top, right)’
plot(percent, xaxt='n', ylab = '% Diff.', xlab = 'CpG', type = 'l', lty = 1, lwd = 3, cex.lab=1.5, cex.axis=1, cex.main=cex_main, cex.sub=1.5, xlim = c(0.5,length(percent)+0.5), ylim = c(min(percent, sum_percent), max(percent, sum_percent)))
points(sum_percent, xaxt='n', xlab = 'CpG', type = 'l', col = 'blue', lty = 2, lwd = 3)
par(xpd=TRUE)
legend('topright', inset=c(legend_space,0), c('Percent', 'Moving Mean %'), col = c('black','blue'), lwd=c(3,3), lty=c(1,2))
axis( 1, at=1:length(xaxis), xaxis)
dev.off()
thanks to #count, the solution is simply to use
las=2 in par settings. las=2 sets the labels to be read vertically.

R - color scatterplot points by z value with legend

I have a scatterplot and wish to color the points by a z value assigned to each point. Then I want to get the legend on the right hand side of the plot to show what colors correspond to what z values using a nice smooth color spectrum.
Here are some x,y,z values you can use so that this is a reproducible example.
x = runif(50)
y = runif(50)
z = runif(50) #determines color of the (x,y) point
I suppose the best answer would be one that is generalized for any color function, but I do anticipate using rainbow()
Translated from this previous question:
library(ggplot2)
d = data.frame(x=runif(50),y=runif(50),z=runif(50))
ggplot(data = d, mapping = aes(x = x, y = y)) + geom_point(aes(colour = z), shape = 19)
If you don't want to use ggplot2 I modified a solution to this provided by someone else, I don't remember who.
scatter_fill <- function (x, y, z,xlim=c(min(x),max(x)),ylim=c(min(y),max(y)),zlim=c(min(z),max(z)),
nlevels = 20, plot.title, plot.axes,
key.title, key.axes, asp = NA, xaxs = "i",
yaxs = "i", las = 1,
axes = TRUE, frame.plot = axes, ...)
{
mar.orig <- (par.orig <- par(c("mar", "las", "mfrow")))$mar
on.exit(par(par.orig))
w <- (3 + mar.orig[2L]) * par("csi") * 2.54
layout(matrix(c(2, 1), ncol = 2L), widths = c(1, lcm(w)))
par(las = las)
mar <- mar.orig
mar[4L] <- mar[2L]
mar[2L] <- 1
par(mar = mar)
# choose colors to interpolate
levels <- seq(zlim[1],zlim[2],length.out = nlevels)
col <- colorRampPalette(c("red","yellow","dark green"))(nlevels)
colz <- col[cut(z,nlevels)]
#
plot.new()
plot.window(xlim = c(0, 1), ylim = range(levels), xaxs = "i", yaxs = "i")
rect(0, levels[-length(levels)], 1, levels[-1L],col=col,border=col)
if (missing(key.axes)) {if (axes){axis(4)}}
else key.axes
box()
if (!missing(key.title))
key.title
mar <- mar.orig
mar[4L] <- 1
par(mar = mar)
# points
plot(x,y,type = "n",xaxt='n',yaxt='n',xlab="",ylab="",xlim=xlim,ylim=ylim,bty="n")
points(x,y,col = colz,xaxt='n',yaxt='n',xlab="",ylab="",bty="n",...)
## options to make mapping more customizable
if (missing(plot.axes)) {
if (axes) {
title(main = "", xlab = "", ylab = "")
Axis(x, side = 1)
Axis(y, side = 2)
}
}
else plot.axes
if (frame.plot)
box()
if (missing(plot.title))
title(...)
else plot.title
invisible()
}
Just run the function first and it is ready to be used. It is quite handy.
# random vectors
vx <- rnorm(40,0,1)
vy <- rnorm(40,0,1)
vz <- rnorm(40,10,10)
scatter_fill(vx,vy,vz,nlevels=15,xlim=c(-1,1),ylim=c(-1,5),zlim=c(-10,10),main="TEST",pch=".",cex=8)
As you can notice, it inherits the usual plot function capabilities.
Another alternative using levelplot in package latticeExtra, with three different colour palettes.
library(latticeExtra)
levelplot(z ~ x + y, panel = panel.levelplot.points, col.regions = heat.colors(50))
levelplot(z ~ x + y, panel = panel.levelplot.points,
col.regions =colorRampPalette(brewer.pal(11,"RdYlGn"))(50))
levelplot(z ~ x + y, panel = panel.levelplot.points, col.regions = rainbow(50))

Resources