Plotting a barchart over a histogram in R - r

I am trying to overlay a histogram with a stacked barplot, yet the barplot is always shifted to the right as it starts plotting from zero. See below for an example on what I am trying to do (without using ggplot, I should maybe add).
set.seed(1)
dat <- rnorm(1000, sd = 10)
h <- hist(dat)
cnt <- h$counts
breaks <- h$breaks
mat <- matrix(NA, nrow = 3, ncol = length(cnt))
for(i in 1:length(cnt)){
sample <- sample(1:3, size = cnt[i], replace = TRUE)
for(j in 1:3){
mat[j, i] <- sum(sample == j)
}
}
barplot(mat, add = TRUE, width = unique(diff(breaks)), space = 0,
col = c("blue", "green", "orange"))
The output from this code looks as follow:
I have tried using columnnames in the matrix mat which specify the position, but to no avail. In the plot I want to create the histogram will be overplotted entirely, as it should be in the exact same place as the barplot. The reason for plotting it in the first place is that I want the axis that a histogram plot gives me. Any ideas on how to do this are very much appreciated.

You may combine the bar mids of barplot console output and breaks of hist to create axis ticks; subtract half of barplot widths from the bar mids. Using mtext gives a better control for the axis labels.
h <- hist(dat, plot=FALSE)
# [...]
.width <- unique(diff(breaks))
b <- barplot(mat, width=.width, space=0,
col=c("blue", "green", "orange"))
axis(1, b-.width/2, labels=FALSE)
mtext(h$breaks[-length(h$breaks)], 1, 1, at=b-.width/2)
Edit
.width <- unique(diff(breaks))
b <- barplot(mat, width=.width, space=0,
col=c("blue", "green", "orange"))
ats <- seq(0, par()$usr[2], 5)
mod <- (ats + 5) %% 20 == 0
labs <- h$breaks
axis(1, ats[mod], labels=FALSE)
mtext(labs[mod], 1, 1, at=ats[mod])

I would just set a manual x axis to the barplot with the desired labels at the desired position, like this:
barplot(mat, width = unique(diff(breaks)),space = 0,
col = c("blue", "green", "orange"))
axis(1,at=c(15,35,55,75),labels = c(-20,0,20,40))

Related

Using base R, how to create a "joy plot" (aka ridgeline plots), with many distributions on top of each other with vertical offset?

The type of plot I am trying to achieve in R seems to have been known as either as moving distribution, as joy plot or as ridgeline plot:
There is already a question in Stackoverflow whose recorded answer explains how to do it using ggplot: How to reproduce this moving distribution plot with R?
However, for learning purposes, I am trying to achieve the same using only base R plots (no lattice, no ggplot, no any plotting package).
In order to get started, I generated the following fake data to play with:
set.seed(2020)
shapes <- c(0.1, 0.5, 1, 2, 4, 5, 6)
dat <- lapply(shapes, function(x) rbeta(1000, x, x))
names(dat) <- letters[1:length(shapes)]
Then using mfrow I can achieve this:
par(mfrow=c(length(shapes), 1))
par(mar=c(1, 5, 1, 1))
for(i in 1:length(shapes))
{
values <- density(dat[[names(dat)[i]]])
plot(NA,
xlim=c(min(values$x), max(values$x)),
ylim=c(min(values$y), max(values$y)),
axes=FALSE,
main="",
xlab="",
ylab=letters[i])
polygon(values, col="light blue")
}
The result I get is:
Clearly, using mfrow (or even layout) here is not flexible enough and also does allow for the overlaps between the distributions.
Then, the question: how can I reproduce that type of plot using only base R plotting functions?
Here's a base R solution. First, we calculate all the density values and then manually offset off the y axis
vals <- Map(function(x, g, i) {
with(density(x), data.frame(x,y=y+(i-1), g))
}, dat, names(dat), seq_along(dat))
Then, to plot, we calculate the overall range, draw an empty plot, and the draw the densities (in reverse so they stack)
xrange <- range(unlist(lapply(vals, function(d) range(d$x))))
yrange <- range(unlist(lapply(vals, function(d) range(d$y))))
plot(0,0, type="n", xlim=xrange, ylim=yrange, yaxt="n", ylab="", xlab="Value")
for(d in rev(vals)) {
with(d, polygon(x, y, col="light blue"))
}
axis(2, at=seq_along(dat)-1, names(dat))
d = lapply(dat, function(x){
tmp = density(x)
data.frame(x = tmp$x, y = tmp$y)
})
d = lapply(seq_along(d), function(i){
tmp = d[[i]]
tmp$grp = names(d)[i]
tmp
})
d = do.call(rbind, d)
grp = unique(d$grp)
n = length(grp)
spcx = 5
spcy = 3
rx = range(d$x)
ry = range(d$y)
rx[2] = rx[2] + n/spcx
ry[2] = ry[2] + n/spcy
graphics.off()
plot(1, type = "n", xlim = rx, ylim = ry, axes = FALSE, ann = FALSE)
lapply(seq_along(grp), function(i){
x = grp[i]
abline(h = (n - i)/spcy, col = "grey")
axis(2, at = (n - i)/spcy, labels = grp[i])
polygon(d$x[d$grp == x] + (n - i)/spcx,
d$y[d$grp == x] + (n - i)/spcy,
col = rgb(0.5, 0.5, 0.5, 0.5))
})

Calculate and plot multiple densities?

I have a matrix with multiple columns and I'd like do calculate the density of each column, and then plot those densities in one single R base plot. Also It would be easier if the plot had a corrected scale automatically.
m <- matrix(rnorm(10), 5, 10))
Create a list of densities d, compute the xlim and ylim values and use those to create an empty plot. Finally draw each of the densities on that plot and optionally draw a legend. As requested, this uses only base R.
set.seed(123)
m <- matrix(rnorm(50), 5, 10) # test data
d <- apply(m, 2, density)
xlim <- range(sapply(d, "[[", "x"))
ylim <- range(sapply(d, "[[", "y"))
plot(NA, xlim = xlim, ylim = ylim, ylab = "density")
nc <- ncol(m)
cols <- rainbow(nc)
for(i in 1:nc) lines(d[[i]], col = cols[i])
legend("topright", legend = 1:nc, lty = 1, col = cols, cex = 0.7)
It can also be done with ggplot2:
library(reshape2)
library(ggplot2)
#Data
set.seed(123)
m <- as.data.frame(matrix(rnorm(50), 5, 10))
#Melt
meltdata <- melt(m)
#Plot 1
ggplot(meltdata,aes(value,color=variable))+
geom_density()+ggtitle('Plot 1')
#Plot 2
ggplot(meltdata,aes(value,fill=variable))+
geom_density(alpha=0.6)+ggtitle('Plot 2')

R. How to avoid lines connecting dots in dotplot

I made a plot using plot() using RStudio.
x = X$pos
y = X$anc
z = data.frame(x,y)
#cut in segments
my_segments = c(52660, 106784, 151429, 192098, 233666,
273857, 307933, 343048, 373099, 408960,
441545, 472813, 497822, 518561, 537471,
556747, 571683, 591232, 599519, 616567,
625727, 633744)
my_cuts = cut(x,my_segments, labels = FALSE)
my_cuts[is.na(my_cuts)] = 0
This is the code:
#create subset of segments
z_alt = z
z_alt[my_cuts %% 2 == 0,] = NA
#plot green, then alternating segments in blue
plot(z, type="p", cex = 0.3,pch = 16,
col="black",
lwd=0.2,
frame.plot = F,
xaxt = 'n', # removes x labels,
ylim = c(0.3, 0.7),
las = 2,
xlim = c(0, 633744),
cex.lab=1.5, # size of axis labels
ann = FALSE, # remove axis titles
mgp = c(3, 0.7, 0))
lines(z_alt,col="red", lwd=0.2)
# adjust y axis label size
par(cex.axis= 1.2, tck=-0.03)
If you see, some black dots are separated, but other black dots have red connecting lines. Does anyone know how to remove these annoying lines?. I just want black and red dots. Many thanks
there is no need to call the points in a second function. you can try to directly set the color in the plot function using a color vector.
# create some data as you have not provided some
set.seed(123)
df <- data.frame(x=1:100,y=runif(100))
# some sgment breaks
my_segments <- c(0,10,20,50,60)
gr <- cut(df$x, my_segments,labels = FALSE, right = T)
gr[is.na(gr)] <- 0
# create color vector with 1 == black, and 2 == red
df$color <- ifelse(gr %% 2 == 0, 1, 2)
# and the plot
plot(df$x, df$y, col = df$color, pch = 16)
The problem here is that you are using lines to add your z_alt. As the name of the function suggests, you will be adding lines. Use points instead.
z <- runif(20,0,1)
z_alt <- runif(20,0.8,1.2)
plot(z, type="p", col="black", pch = 16, lwd=0.2, ylim = c(0,1.4))
points(z_alt, col = "red", pch = 16, lwd = 0.2)

Plot two time series with different y-axes: one as a dot plot (or a bar plot) and the other as a line

I have two time series of data, each with a different range of values. I would like to plot one as a dotplot and the other as a line over the dotplot. (I would settle for a decent-looking barplot and a line over the barplot, but my preference is a dotplot.)
#make some data
require(lubridate)
require(ggplot)
x1 <- sample(1990:2010, 10, replace=F)
x1 <- paste(x1, "-01-01", sep="")
x1 <- as.Date(x1)
y1 <- sample(1:10, 10, replace=T)
data1 <- cbind.data.frame(x1, y1)
year <- sample(1990:2010, 10, replace=F)
month <- sample(1:9, 10, replace=T)
day <- sample(1:28, 10, replace=T)
x2 <- paste(year, month, day, sep="-")
x2 <- as.Date(x2)
y2 <- sample(100:200, 10, replace=T)
data2 <- cbind.data.frame(x2, y2)
data2 <- data2[with(data2, order(x2)), ]
# frequency data for dot plot
x3 <- sample(1990:2010, 25, replace=T)
data2 <- as.data.frame(x3)
I can make a dotplot or barplot with one data set in ggplot:
ggplot() + geom_dotplot(data=data2, aes(x=x3))
ggplot() + geom_bar(data=data, aes(x=x1, y=y1), stat="identity")
But I can't overlay the second data set because ggplot doesn't permit a second y-axis.
I can't figure out how to plot a time series using barplot().
I can plot the first set of data as an "h" type plot, using plot(), and add the second set of data as a line, but I can't make the bars any thicker because each one corresponds to a single day over a stretch of many years, and I think it's ugly.
plot(data$x1, data$y1, type="h")
par(new = T)
plot(data2$x2, data2$y2, type="l", axes=F, xlab=NA, ylab=NA)
axis(side=4)
Any ideas? My only remaining idea is to make two separate plots and overlay them in a graphics program. :/
An easy workaround is to follow your base plotting instinct and beef up lwd for type='h'. Be sure to set lend=1 to prevent rounded lines:
par(mar=c(5, 4, 2, 5) + 0.1)
plot(data1, type='h', lwd=20, lend=1, las=1, xlab='Date', col='gray',
xlim=range(data1$x1, data2$x2))
par(new=TRUE)
plot(data2, axes=FALSE, type='o', pch=20, xlab='', ylab='', lwd=2,
xlim=range(data1$x1, data2$x2))
axis(4, las=1)
mtext('y2', 4, 3.5)
I removed the original answer.
To answer your question about making a dot plot, you can rearrange your data so that you can use the base plotting function. An example:
use the chron package for plotting:
library(chron)
dummy data:
count.data <- data.frame("dates" = c("1/27/2000", "3/27/2000", "6/27/2000", "10/27/2000"), "counts" = c(3, 10, 5, 1), stringsAsFactors = F)
replicate the dates in a list:
rep.dates <- sapply(1:nrow(count.data), function(x) rep(count.data$dates[x], count.data$counts[x]))
turn the counts into a sequence:
seq.counts <- sapply(1:nrow(count.data), function(x) seq(1, count.data$counts[x], 1))
plot it up:
plot(as.chron(rep.dates[[1]]), seq.counts[[1]], xlim = c(as.chron("1/1/2000"), as.chron("12/31/2000")),
ylim = c(0, 20), pch = 20, cex = 2)
for(i in 2:length(rep.dates)){
points(as.chron(rep.dates[[i]]), seq.counts[[i]], pch = 20, cex = 2)
}

Color code a scatterplot based on another value - no ggplot

I have to plot some plot(x,y) scatters, but i would like the points to be color coded based on the value of a continuous variable z.
I would like a temperature palette (from dark blue to bright red). I tried with Rcolorbrewer however the the RdBu palette (which resembles the temperature palette) uses white for the middle values which looks very bad.
I would also like to plot a legend explaining the color coding with a sample of colors and corresponding values.
Any ideas if this can be performed easily in R? No ggplot please!
Season greetings to everybody
Building off of #BenBolker's answer, you can do the legend if you take a peek at the code for filled.contour. I hacked that function apart to look like this:
scatter.fill <- function (x, y, z,
nlevels = 20, plot.title, plot.axes,
key.title, key.axes, asp = NA, xaxs = "i",
yaxs = "i", las = 1,
axes = TRUE, frame.plot = axes, ...)
{
mar.orig <- (par.orig <- par(c("mar", "las", "mfrow")))$mar
on.exit(par(par.orig))
w <- (3 + mar.orig[2L]) * par("csi") * 2.54
layout(matrix(c(2, 1), ncol = 2L), widths = c(1, lcm(w)))
par(las = las)
mar <- mar.orig
mar[4L] <- mar[2L]
mar[2L] <- 1
par(mar = mar)
#Some simplified level/color picking
levels <- seq(min(z),max(z),length.out = nlevels)
col <- colorRampPalette(c("blue","red"))(nlevels)[rank(z)]
plot.new()
plot.window(xlim = c(0, 1), ylim = range(levels), xaxs = "i",
yaxs = "i")
rect(0, levels[-length(levels)], 1, levels[-1L], col = colorRampPalette(c("blue","red"))(nlevels)
if (missing(key.axes)) {
if (axes)
axis(4)
}
else key.axes
box()
if (!missing(key.title))
key.title
mar <- mar.orig
mar[4L] <- 1
par(mar = mar)
#Simplified scatter plot construction
plot(x,y,type = "n")
points(x,y,col = col,...)
if (missing(plot.axes)) {
if (axes) {
title(main = "", xlab = "", ylab = "")
Axis(x, side = 1)
Axis(y, side = 2)
}
}
else plot.axes
if (frame.plot)
box()
if (missing(plot.title))
title(...)
else plot.title
invisible()
}
And then applying the code from Ben's example we get this:
x <- runif(40)
y <- runif(40)
z <- runif(40)
scatter.fill(x,y,z,nlevels = 40,pch = 20)
which produces a plot like this:
Fair warning, I really did just hack apart the code for filled.contour. You will likely want to inspect the remaining code and remove unused bits, or fix parts that I rendered non-functional.
Here some home-made code to achieve it with default packages (base, graphics, grDevices) :
# Some data
x <- 1:1000
y <- rnorm(1000)
z <- 1:1000
# colorRamp produces custom palettes, but needs values between 0 and 1
colorFunction <- colorRamp(c("darkblue", "black", "red"))
zScaled <- (z - min(z)) / (max(z) - min(z))
# Apply colorRamp and switch to hexadecimal representation
zMatrix <- colorFunction(zScaled)
zColors <- rgb(zMatrix, maxColorValue=255)
# Let's plot
plot(x=x, y=y, col=zColors, pch="+")
For StanLe, here is the corresponding legend (to be added by layout or something similar) :
# Resolution of the legend
n <- 10
# colorRampPalette produces colors in the same way than colorRamp
plot(x=NA, y=NA, xlim=c(0,n), ylim=0:1, xaxt="n", yaxt="n", xlab="z", ylab="")
pal <- colorRampPalette(c("darkblue", "black", "red"))(n)
rect(xleft=0:(n-1), xright=1:n, ybottom=0, ytop=1, col=pal)
# Custom axis ticks (consider pretty() for an automated generation)
lab <- c(1, 500, 1000)
at <- (lab - min(z)) / (max(z) - min(z)) * n
axis(side=1, at=at, labels=lab)
This is a reasonable solution -- I used blue rather than dark blue for the starting point, but you can check out ?rgb etc. to adjust the color to your liking.
nbrk <- 30
x <- runif(20)
y <- runif(20)
cc <- colorRampPalette(c("blue","red"))(nbrk)
z <- runif(20)
plot(x,y,col=cc[cut(z,nbrk)],pch=16)

Resources