R - How do I line up axes between a boxplot and matplot? - r

I have created a side-by-side plot in R where both plots are supposed to use the same y-axis. However, the plot on the left is a boxplot and the plot on the right is a matplot and in both plots I have set the same y-axis range ylim = c(0, YMAX). Unfortunately, as you can see below, these plots do not appear to use the same layout range --- the barplot takes the range right to the edges of the axis whereas the matplot has a buffer at each edge of the axis. Consequently, the y-axes on the plots do not line up as intended.
#Create layout for plot
LAYOUT <- matrix(c(rep(1, 2), 2:3), nrow = 2, ncol = 2, byrow = TRUE);
layout(LAYOUT, heights = c(0.1, 1));
#Create plot matrix
par(mar = c(0.5, 2.1, 0.5, 2.1), mgp = c(3, 1, 0), las = 0);
plot.new();
text(0.5,0.5, 'Barplot and Violin Plot', cex = 1.2, font = 2);
par(mar = c(5.1, 4.1, 2.1, 2.1), mgp = c(3, 1, 0), las = 0);
#Generate data for plot
x <- 1:100
y <- rchisq(100, df = 40);
#Generate plots
DENS <- density(y);
YMAX <- 1.4*max(y);
barplot(y, names.arg = x, ylim = c(0, YMAX));
matplot(x = cbind(-DENS$y, DENS$y), y = DENS$x,
type = c('l', 'l'), lty = c(1, 1),
col = c('black', 'black'),
xlim = c(-max(DENS$y), max(DENS$y)),
ylim = c(0, YMAX),
xlab = 'Density', ylab = '');
How do I adjust this plot to line up the y-axes? (Ideally I would like the plot on the right to put the ticks right to the edge of the axis just as is the case on the left.)

The comment by user20650 solves my problem, so I am going to take the liberty of expanding it into a larger answer and linking to some documentation I found on the problem. According to some lecture notes on the base graphics parameters, some of the base plots in R add a 6% buffer beyond the specified axis range by default. The commands xasx = 'i' and yasx = 'i' inhibit this buffer (on the x and y axes respectively), so that the axis limits go right to the edge of the axis.
Applying this solution to the y-axis in the present problem (we do not apply it to the x-axis, since we want to retain the buffer on that axis) gives the following commands and plot. As can be seen from the plot, the y-axes in the two plots now line up correctly. Hooray!
#Create layout for plot
LAYOUT <- matrix(c(rep(1, 2), 2:3), nrow = 2, ncol = 2, byrow = TRUE);
layout(LAYOUT, heights = c(0.1, 1));
#Create plot matrix
par(mar = c(0.5, 2.1, 0.5, 2.1), mgp = c(3, 1, 0), las = 0);
plot.new();
text(0.5,0.5, 'Barplot and Violin Plot', cex = 1.2, font = 2);
par(mar = c(5.1, 4.1, 2.1, 2.1), mgp = c(3, 1, 0), las = 0);
#Generate data for plot
x <- 1:100
y <- rchisq(100, df = 40);
#Generate plots
DENS <- density(y);
YMAX <- 1.4*max(y);
barplot(y, names.arg = x, ylim = c(0, YMAX));
matplot(x = cbind(-DENS$y, DENS$y), y = DENS$x, yaxs = 'i',
type = c('l', 'l'), lty = c(1, 1),
col = c('black', 'black'),
xlim = c(-max(DENS$y), max(DENS$y)),
ylim = c(0, YMAX),
xlab = 'Density', ylab = '');

Related

How to remove repeated items in an R chart legend?

I'm working with starwars dataset (dplyr package), and I want to make a graph where the independent variable is the height of the characters, and the dependent variable is their body mass. Furthermore, I also want to discern species by colors:
library(dplyr)
starwars
par(mar = c(5.3, 4.3, 4.3, 8.3), xpd = TRUE)
plot(starwars$mass ~ starwars$height, ylim = c(0, 200),
col = as.factor(starwars$species), bty = "l")
legend("topright", inset = c(-0.2, 0), legend = as.factor(starwars$species),
cex = 0.50, col = as.factor(starwars$species),
ncol = 2, pch = 16)
Figure
Note that some species are repeated in the legend. How to exclude these repetitions?
Include only the unique values in legend :
plot(starwars$mass ~ starwars$height, ylim = c(0, 200),
col = as.factor(starwars$species), bty = "l")
legend("topright", inset = c(-0.2, 0),
legend = as.factor(unique(starwars$species)),
cex = 0.50, col = as.factor(unique(starwars$species)),
ncol = 2, pch = 16)

Plotting CDF in histogram form in R

I am not an expert on stats, but I've been trying to plot a cdf out of an array of points. I've tried R and Python both. These are my example set of points:(1.5,1.5,2.5,3.5,3.5,3.5,4.5,5.5,5.5,6)
Using the ecdf function in R, I manage to get this:
This was my code:
data <- c(1.5,1.5,2.5,3.5,3.5,3.5,4.5,5.5,5.5,6)
plot(ecdf(data))
Is there a way to get the same plotted as a histogram or would that be fundamentally wrong?
Is this what you mean?
par(mar = c(5,5,2,5))
data <- c(1.5,1.5,2.5,3.5,3.5,3.5,4.5,5.5,5.5,6)
h <- hist(
data,
breaks = seq(0, 10, 1),
xlim = c(0,10))
par(new = T)
ec <- ecdf(data)
plot(x = h$mids, y=ec(h$mids)*max(h$counts), col = rgb(0,0,0,alpha=0), axes=F, xlab=NA, ylab=NA)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
axis(4, at=seq(from = 0, to = max(h$counts), length.out = 11), labels=seq(0, 1, 0.1), col = 'red', col.axis = 'red')
mtext(side = 4, line = 3, 'Cumulative Density', col = 'red')

Polygon function creates strange shape

I'm trying to plot multiple distributions on the same plot. For some reason when I use the polygon function, it doesn't sit on the x-axis, and starts to levitate.
Grateful for any advice on how to stop this!
x <- seq(0,5,length=1000)
plot(x = x,
y = dnorm(x, 1.5, 0.4),
type = "l",
col = "white",
axes = FALSE,
mgp = c(2, 2, 2),
ylim=c(0,2), # Set limit of y-axis
frame.plot=TRUE,
xlab = "theta",
ylab = "plausibility",
font.main = 1,
main=paste("Distributions"),
lwd=2
)
polygon(x,dnorm(x, 1.5, 0.4),col=1,border = NULL)
polygon(x,dnorm(x, 1, 0.5),col=2,border = NULL)
As the documentation of ?polygon states
x, y vectors containing the coordinates of the vertices of the polygon.
We have to add 0 as a vertex to both the x and the y coordinates.
polygon(x=c(0, x), y=c(0, dnorm(x, 1.5, 0.4)), col=1, border=NULL)
polygon(x=c(0, x), y=c(0, dnorm(x, 1, 0.5)), col=2, border=NULL)
Result

How to make the trend-line in a scatter plot respect the boundaries of the x-axis?

I am creating a plot where I plot the variable on the X-axis against that on the Y-axis, and I am adding histograms of the variables as well. I have added a trend-line to the plot using abline().
The problem is that it does not appear to respect the xlim = c(0, 20) in the plot region as it extends beyond the limits of the x-axis. I tried playing around with the xpd option, but to no avail. Next I tried fiddling with the different par()options, but found nothing that could help with this issue.
What I want is for the trend-line to be the exact length of the x-axis. Any help is much appreciated. In this particular case the trend-line is almost flat, but the slope will change when I do the same for other variables.
MWE -- NOTE: I am only providing 15 data points to illustrate the issue so the graph will differ from the image provided.
df.data <- data.frame(id = 1:15,
ll = c(-9.53026, -6.50640,-6.50640, -7.68535, -11.80899, -8.42790,
-6.50640, -6.50640, -7.92405, -6.50640, -8.95522, -9.99228,
-10.02286, -8.95969, -6.07313),
aspm = c(4.582104, 0.490244, 0.737765, 0.256699, 1.575931, 1.062693,
1.006984, 0.590355, 1.014370, 0.924855, 0.735989, 0.831025,
1.197886, 1.143220, 0.928068))
str.col.light.blue <- c(rgb(r = 110/255, g = 155/255, b = 225/255))
str.col.dark.blue <- c(rgb(r = 50/255, g = 100/255, b = 185/255))
layout(matrix(c(2, 4, 1, 3), 2, 2, byrow = TRUE), widths = c(5, 2), heights = c(2, 5))
layout.show(4)
par(omi = c(0.1, 0.1, 0.1, 0.1))
par(mar = c(2, 2, 0, 0))
par(mai = c(1, 1, 0, 0))
plot(df.data[, "ll"] ~ df.data[, "aspm"], col = str.col.light.blue,
xlim = c(0, 20), ylim = c(-15, -5), axes = FALSE,
xlab = "X1", ylab = "X2",
cex.lab = 1.25)
abline(a = -8.156670, b = -0.000879, lty = 5, col = "black", lwd = 2, xpd = FALSE)
axis(1, at = seq(0, 20, by = 5), labels = seq(0, 20, by = 5), cex.axis = 1)
axis(2, at = seq(-15, -5, by = 3), labels = seq(-15, -5, by = 3), cex.axis = 1, las = 1)
rect(0, -15, 20, log(1/3)*8, density = 10, angle = 45, lwd = 0.5, col = "gray")
par(mar = c(0, 2, 0, 0))
par(mai = c(0, 1, 0.25, 0))
x.hist <- hist(df.data[, "aspm"], plot = FALSE, breaks = 20)
barplot(x.hist$density, axes = FALSE, horiz = FALSE, space = 0, col = str.col.dark.blue)
par(mar = c(2, 0, 0, 0))
par(mai = c(1, 0, 0, 0.25))
y.hist <- hist(df.data[, "ll"], plot = FALSE, breaks = 20)
barplot(y.hist$density, axes = FALSE, horiz = TRUE, space = 0, col = str.col.dark.blue)
In order to avoid working out the start and end points of the segments, you can program a helper function to do it for you.
linear <- function(x, a, b) a + b*x
Then, I've used your code with the following changes. abline was replaced by segments, with all the graphics parameters you had used in your original call.
x0 <- 0
y0 <- linear(x0, a = -8.156670, b = -0.000879)
x1 <- 20
y1 <- linear(x1, a = -8.156670, b = -0.000879)
segments(x0, y0, x1, y1, lty = 5, col = "black", lwd = 2, xpd = FALSE)
This call to segment was placed where ablinewas.
In the final graph, I see a well behaved segment.

Adding custom tick marks on R plot

I'm plotting a cdf of some data, and I've added logarithmic scale on the "x" axis.
The ticks spacing is exactly as I want it to be, but I'd like to be able to add
some tick marks on specific points.
I don't want to change the distribution of the ticks in my plot, from n by n to m by m, I want simply to have, among the ticks from n by n, some further tick marks on some values.
I'd like to have it reflected in both x and y axis, so that I can fit a grid into these new marks throughout the graph.
So far I have the graph, and the grid -- I don't mind about having the grid behind or upon the graph, I just want to add some custom ticks.
# Cumulative Distribuition
pdf("g1_3.pdf")
plot(x = f$V2, y = cumsum(f$V1), log = "x", pch = 3,
xlab = "Frequency", ylab = "P(X <= x)",
panel.first = grid(equilogs = FALSE))
axis(1, at = c(40, 150))
abline(h = 0.6, v = 40, col = "lightgray", lty = 3)
abline(h = 0.6, v = 150, col = "lightgray", lty = 3)
dev.off()
UPDATE: The graph I have so far:
Considering the initial script, and the tips given by #BenBolker, I had to use:
axis(side = 1, at = c([all the ticks you want]))
in order to add the ticks in the graph. Here's the final result:
# Cumulative Distribuition
pdf("g1_3.pdf")
plot(x = f$V2, y = cumsum(f$V1), log = "x", pch = 3,
xlab = "Frequency", ylab = "P(X <= x)", axes = FALSE)
ticks = c(1, 5, 10, 40, 150, 500, 1000)
axis(side = 1, at = ticks)
axis(side = 2)
abline(h = seq(0, 1, 0.2), v = ticks, col = "lightgray", lty = 3)
box()

Resources