I have a matrix with multiple columns and I'd like do calculate the density of each column, and then plot those densities in one single R base plot. Also It would be easier if the plot had a corrected scale automatically.
m <- matrix(rnorm(10), 5, 10))
Create a list of densities d, compute the xlim and ylim values and use those to create an empty plot. Finally draw each of the densities on that plot and optionally draw a legend. As requested, this uses only base R.
set.seed(123)
m <- matrix(rnorm(50), 5, 10) # test data
d <- apply(m, 2, density)
xlim <- range(sapply(d, "[[", "x"))
ylim <- range(sapply(d, "[[", "y"))
plot(NA, xlim = xlim, ylim = ylim, ylab = "density")
nc <- ncol(m)
cols <- rainbow(nc)
for(i in 1:nc) lines(d[[i]], col = cols[i])
legend("topright", legend = 1:nc, lty = 1, col = cols, cex = 0.7)
It can also be done with ggplot2:
library(reshape2)
library(ggplot2)
#Data
set.seed(123)
m <- as.data.frame(matrix(rnorm(50), 5, 10))
#Melt
meltdata <- melt(m)
#Plot 1
ggplot(meltdata,aes(value,color=variable))+
geom_density()+ggtitle('Plot 1')
#Plot 2
ggplot(meltdata,aes(value,fill=variable))+
geom_density(alpha=0.6)+ggtitle('Plot 2')
Related
The type of plot I am trying to achieve in R seems to have been known as either as moving distribution, as joy plot or as ridgeline plot:
There is already a question in Stackoverflow whose recorded answer explains how to do it using ggplot: How to reproduce this moving distribution plot with R?
However, for learning purposes, I am trying to achieve the same using only base R plots (no lattice, no ggplot, no any plotting package).
In order to get started, I generated the following fake data to play with:
set.seed(2020)
shapes <- c(0.1, 0.5, 1, 2, 4, 5, 6)
dat <- lapply(shapes, function(x) rbeta(1000, x, x))
names(dat) <- letters[1:length(shapes)]
Then using mfrow I can achieve this:
par(mfrow=c(length(shapes), 1))
par(mar=c(1, 5, 1, 1))
for(i in 1:length(shapes))
{
values <- density(dat[[names(dat)[i]]])
plot(NA,
xlim=c(min(values$x), max(values$x)),
ylim=c(min(values$y), max(values$y)),
axes=FALSE,
main="",
xlab="",
ylab=letters[i])
polygon(values, col="light blue")
}
The result I get is:
Clearly, using mfrow (or even layout) here is not flexible enough and also does allow for the overlaps between the distributions.
Then, the question: how can I reproduce that type of plot using only base R plotting functions?
Here's a base R solution. First, we calculate all the density values and then manually offset off the y axis
vals <- Map(function(x, g, i) {
with(density(x), data.frame(x,y=y+(i-1), g))
}, dat, names(dat), seq_along(dat))
Then, to plot, we calculate the overall range, draw an empty plot, and the draw the densities (in reverse so they stack)
xrange <- range(unlist(lapply(vals, function(d) range(d$x))))
yrange <- range(unlist(lapply(vals, function(d) range(d$y))))
plot(0,0, type="n", xlim=xrange, ylim=yrange, yaxt="n", ylab="", xlab="Value")
for(d in rev(vals)) {
with(d, polygon(x, y, col="light blue"))
}
axis(2, at=seq_along(dat)-1, names(dat))
d = lapply(dat, function(x){
tmp = density(x)
data.frame(x = tmp$x, y = tmp$y)
})
d = lapply(seq_along(d), function(i){
tmp = d[[i]]
tmp$grp = names(d)[i]
tmp
})
d = do.call(rbind, d)
grp = unique(d$grp)
n = length(grp)
spcx = 5
spcy = 3
rx = range(d$x)
ry = range(d$y)
rx[2] = rx[2] + n/spcx
ry[2] = ry[2] + n/spcy
graphics.off()
plot(1, type = "n", xlim = rx, ylim = ry, axes = FALSE, ann = FALSE)
lapply(seq_along(grp), function(i){
x = grp[i]
abline(h = (n - i)/spcy, col = "grey")
axis(2, at = (n - i)/spcy, labels = grp[i])
polygon(d$x[d$grp == x] + (n - i)/spcx,
d$y[d$grp == x] + (n - i)/spcy,
col = rgb(0.5, 0.5, 0.5, 0.5))
})
I am trying to overlay a histogram with a stacked barplot, yet the barplot is always shifted to the right as it starts plotting from zero. See below for an example on what I am trying to do (without using ggplot, I should maybe add).
set.seed(1)
dat <- rnorm(1000, sd = 10)
h <- hist(dat)
cnt <- h$counts
breaks <- h$breaks
mat <- matrix(NA, nrow = 3, ncol = length(cnt))
for(i in 1:length(cnt)){
sample <- sample(1:3, size = cnt[i], replace = TRUE)
for(j in 1:3){
mat[j, i] <- sum(sample == j)
}
}
barplot(mat, add = TRUE, width = unique(diff(breaks)), space = 0,
col = c("blue", "green", "orange"))
The output from this code looks as follow:
I have tried using columnnames in the matrix mat which specify the position, but to no avail. In the plot I want to create the histogram will be overplotted entirely, as it should be in the exact same place as the barplot. The reason for plotting it in the first place is that I want the axis that a histogram plot gives me. Any ideas on how to do this are very much appreciated.
You may combine the bar mids of barplot console output and breaks of hist to create axis ticks; subtract half of barplot widths from the bar mids. Using mtext gives a better control for the axis labels.
h <- hist(dat, plot=FALSE)
# [...]
.width <- unique(diff(breaks))
b <- barplot(mat, width=.width, space=0,
col=c("blue", "green", "orange"))
axis(1, b-.width/2, labels=FALSE)
mtext(h$breaks[-length(h$breaks)], 1, 1, at=b-.width/2)
Edit
.width <- unique(diff(breaks))
b <- barplot(mat, width=.width, space=0,
col=c("blue", "green", "orange"))
ats <- seq(0, par()$usr[2], 5)
mod <- (ats + 5) %% 20 == 0
labs <- h$breaks
axis(1, ats[mod], labels=FALSE)
mtext(labs[mod], 1, 1, at=ats[mod])
I would just set a manual x axis to the barplot with the desired labels at the desired position, like this:
barplot(mat, width = unique(diff(breaks)),space = 0,
col = c("blue", "green", "orange"))
axis(1,at=c(15,35,55,75),labels = c(-20,0,20,40))
df1 <- read.csv("C:\\Users\\Unique\\Desktop\\Data Science\\
R Scripts\\LimeBison_ch1.csv")
df2 <- df1[df1[,3] == 73.608125,]
plot(df2[,1],df2[,2], xlab = "Milliseconds", ylab = "Amplitude",
main = "Amplitude vs Time Graph",type = "p", pch =16, col = "red",
xlim = c(-200,1200), ylim = c(-1.5,1.5))
x <- tapply(df2$Amplitude, df2$Time, mean)
df3 <- data.frame(Time = names(x), Average_Amplitude = x)
How can I plot a curve of the data frame df3 over the scatter plot of df2?
I'm not sure about your data, but if you are using base plotting, then you can plot a line on top of a scatter plot by using the lines function
df <- data.frame(x = 1:40, y = c(1:20, 20: 1))
plot(df$x, df$y, cex = 2)
lines(df$x, df$y)
Background
I have a function called TPN. When you run this function, it produces two plots (see picture below). The bottom-row plot samples from the top-row plot.
Question
I'm wondering how I could fix the ylim of the bottom-row plot to be always (i.e., regardless of the input values) the same as ylim of the top-row plot?
R code is provided below the picture (Run the entire block of code).
############## Input Values #################
TPN = function( each.sub.pop.n = 150,
sub.pop.means = 20:10,
predict.range = 10:0,
sub.pop.sd = .75,
n.sample = 2 ) {
#############################################
par( mar = c(2, 4.1, 2.1, 2.1) )
m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)
set.seed(2460986)
Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')
y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )
set.seed(NULL)
x <- rep(predict.range, each = each.sub.pop.n)
plot(x, y) ## Plot #1
sample <- lapply(split(y, x), function(z) sample(z, n.sample, replace = TRUE))
sample <- data.frame(y = unlist(sample),
x = as.numeric(rep(names(sample), each = n.sample)))
plot(sample$x, sample$y) ## Plot # 2
}
## TEST HERE:
TPN()
You can get the ylim using par("yaxp")[1:2]. So, you can change the second plot code to have its ylim as the first plot's:
plot(sample$x, sample$y, ylim = par("yaxp")[1:2]) ## Plot # 2
or as mentioned in the comments, you can simply set the ylim for both plots to be range of both data-sets and add that to both plots:
ylim = range(c(y, sample$y))
Another option: Produce the same plot again but with type = "n" and then filling the points with points(). For example, change your plot 2 to
plot(x, y, type = "n")
points(sample$x, sample$y)
A benefit of this approach is that everything in the plot will be exactly the same, not just the y-axis (which may or may not matter for your function).
I want to compare two datasets with same x and y variables. However, not all X variable points are present on both. As a toy example say this is what I have:
position.x <- c(1,2,3)
score.x <- c(450,220,330)
x <- data.frame(position,score.x)
position.y <- c(2,3,5)
score.y <- c(333,423,988)
y<- data.frame(position.y,score.y)
par(mfrow = c(2,1))
plot(x, pch = 19)
plot(y, pch = 19)
X axes are not comparable. I found some post explaining how to do it on ggplot using facet_wrap but I would like to do it using base graph.
Thank you in advance.
you could specify the range of the x and y axises by xlim and slim
position.x <- c(1,2,3)
score.x <- c(450,220,330)
x <- data.frame(position,score.x)
position.y <- c(2,3,5)
score.y <- c(333,423,988)
y<- data.frame(position.y,score.y)
par(mfrow = c(2,1))
plot(x, pch = 19, xlim=c(1,5))
plot(y, pch = 19, xlim=c(1,5))
if you are going to repeat this, you might as well write some kind of function (which is one of the benefits of ggplot--it takes care of all the set-up for you):
## data needs to be in a long format
dat <- data.frame(position = c(1,2,3,2,3,5),
score = c(450,220,330,333,423,988),
z = c('x','x','x','y','y','y'))
facet_wrap <- function(data, x, y, z, horiz = TRUE, ...) {
## save current par settings and return after finished
op <- par(no.readonly = TRUE)
on.exit(par(op))
zz <- unique(data[, z])
## sets up the layout to cascade horizontally or vertically
## and sets xlim and ylim appropriately
if (horiz) {
par(mfrow = c(1, length(zz)), ...)
ylim <- range(data[, y])
xlim <- NULL
} else {
par(mfrow = c(length(zz), 1), ...)
xlim <- range(data[, x])
ylim <- NULL
}
## make a subset of data for each unique by variable
## and draw a basic plot for each one
for (ii in zz) {
tmp <- data[data[, z] %in% ii, ]
plot(tmp[, x], tmp[, y], xlim = xlim, ylim = ylim)
}
}
facet_wrap(dat, 'position', 'score', 'z', mar = c(5,4,2,2))
facet_wrap(dat, 'position', 'score', 'z', mar = c(5,4,1,2), horiz = FALSE)