How to remove repeated items in an R chart legend? - r

I'm working with starwars dataset (dplyr package), and I want to make a graph where the independent variable is the height of the characters, and the dependent variable is their body mass. Furthermore, I also want to discern species by colors:
library(dplyr)
starwars
par(mar = c(5.3, 4.3, 4.3, 8.3), xpd = TRUE)
plot(starwars$mass ~ starwars$height, ylim = c(0, 200),
col = as.factor(starwars$species), bty = "l")
legend("topright", inset = c(-0.2, 0), legend = as.factor(starwars$species),
cex = 0.50, col = as.factor(starwars$species),
ncol = 2, pch = 16)
Figure
Note that some species are repeated in the legend. How to exclude these repetitions?

Include only the unique values in legend :
plot(starwars$mass ~ starwars$height, ylim = c(0, 200),
col = as.factor(starwars$species), bty = "l")
legend("topright", inset = c(-0.2, 0),
legend = as.factor(unique(starwars$species)),
cex = 0.50, col = as.factor(unique(starwars$species)),
ncol = 2, pch = 16)

Related

R - How do I line up axes between a boxplot and matplot?

I have created a side-by-side plot in R where both plots are supposed to use the same y-axis. However, the plot on the left is a boxplot and the plot on the right is a matplot and in both plots I have set the same y-axis range ylim = c(0, YMAX). Unfortunately, as you can see below, these plots do not appear to use the same layout range --- the barplot takes the range right to the edges of the axis whereas the matplot has a buffer at each edge of the axis. Consequently, the y-axes on the plots do not line up as intended.
#Create layout for plot
LAYOUT <- matrix(c(rep(1, 2), 2:3), nrow = 2, ncol = 2, byrow = TRUE);
layout(LAYOUT, heights = c(0.1, 1));
#Create plot matrix
par(mar = c(0.5, 2.1, 0.5, 2.1), mgp = c(3, 1, 0), las = 0);
plot.new();
text(0.5,0.5, 'Barplot and Violin Plot', cex = 1.2, font = 2);
par(mar = c(5.1, 4.1, 2.1, 2.1), mgp = c(3, 1, 0), las = 0);
#Generate data for plot
x <- 1:100
y <- rchisq(100, df = 40);
#Generate plots
DENS <- density(y);
YMAX <- 1.4*max(y);
barplot(y, names.arg = x, ylim = c(0, YMAX));
matplot(x = cbind(-DENS$y, DENS$y), y = DENS$x,
type = c('l', 'l'), lty = c(1, 1),
col = c('black', 'black'),
xlim = c(-max(DENS$y), max(DENS$y)),
ylim = c(0, YMAX),
xlab = 'Density', ylab = '');
How do I adjust this plot to line up the y-axes? (Ideally I would like the plot on the right to put the ticks right to the edge of the axis just as is the case on the left.)
The comment by user20650 solves my problem, so I am going to take the liberty of expanding it into a larger answer and linking to some documentation I found on the problem. According to some lecture notes on the base graphics parameters, some of the base plots in R add a 6% buffer beyond the specified axis range by default. The commands xasx = 'i' and yasx = 'i' inhibit this buffer (on the x and y axes respectively), so that the axis limits go right to the edge of the axis.
Applying this solution to the y-axis in the present problem (we do not apply it to the x-axis, since we want to retain the buffer on that axis) gives the following commands and plot. As can be seen from the plot, the y-axes in the two plots now line up correctly. Hooray!
#Create layout for plot
LAYOUT <- matrix(c(rep(1, 2), 2:3), nrow = 2, ncol = 2, byrow = TRUE);
layout(LAYOUT, heights = c(0.1, 1));
#Create plot matrix
par(mar = c(0.5, 2.1, 0.5, 2.1), mgp = c(3, 1, 0), las = 0);
plot.new();
text(0.5,0.5, 'Barplot and Violin Plot', cex = 1.2, font = 2);
par(mar = c(5.1, 4.1, 2.1, 2.1), mgp = c(3, 1, 0), las = 0);
#Generate data for plot
x <- 1:100
y <- rchisq(100, df = 40);
#Generate plots
DENS <- density(y);
YMAX <- 1.4*max(y);
barplot(y, names.arg = x, ylim = c(0, YMAX));
matplot(x = cbind(-DENS$y, DENS$y), y = DENS$x, yaxs = 'i',
type = c('l', 'l'), lty = c(1, 1),
col = c('black', 'black'),
xlim = c(-max(DENS$y), max(DENS$y)),
ylim = c(0, YMAX),
xlab = 'Density', ylab = '');

My layout doesn't allow me to show xlab and ylab

It looks like something simple I am missing but have no idea how to deal with this.
So I used a layout() function and I managed to get the layout as I wanted as below picture. Iris data was used in my coding.
Problem is, it does not show me the x label and y label on the output when I use plot() functions after this. And xaxis and yaxis for plot() looks overlapping. I am not sure how to deal with this problem.
There was no problem for x and y labelling before introducing plot.new() and par() to set up the main name of my diagram. (i.e. before I use the code from plot.new() to title(), xlab and ylab were shown)
I used 6 different plots in my original code, including, the plot.new() for title(), but I omitted the rest of them for convenience
Here is my code below,
x <- iris$Sepal.Length
y <- iris$Species
x_min <- min(iris$Sepal.Length)
x_max <- max(iris$Sepal.Length)
y_min <- min(iris$Sepal.Width)
y_max <- max(iris$Sepal.Width)
layout(matrix(c(1,1,1,1,1,1,
2,2,3,3,4,4,
5,5,5,6,6,6), nc=6, byrow = TRUE), heights=c(lcm(1),1,1,1,1))
layout.show(6)
par("mar"=c(1,1,1,1,1,1))
plot.new()
plot.window(xlim=c(0,1), ylim=c(0,1))
text(x=0.5,y=0.5,"scatter and density plots for Sepal and Length and Sepal Width" ,font=2, cex=1.5)
plot(...)
You can use the xlab and ylab arguments in title. However, the way you have constructed the plot means that when you reset par at the end, these are drawn off the page due ti their position relative to your custom axis. If you simply leave par alone, you get:
den1 = density(CDE1$V1)
den2 = density(CDE1$V2)
col1 = hsv(h = 0.65, s = 0.6, v = 0.8, alpha = 0.5)
col2 = hsv(h = 0.85, s = 0.6, v = 0.8, alpha = 0.5)
plot.new()
plot.window(xlim = c(25,65), ylim = c(0, 0.14))
axis(side = 1, pos = 0, at = seq(from = 25, to = 65, by = 5), col = "gray20",
lwd.ticks = 0.25, cex.axis = 1, col.axis = "gray20", lwd = 1.5)
axis(side = 2, pos = 25, at = seq(from = 0, to = 0.14, by = 0.02),
col = "gray20", las = 2, lwd.ticks = 0.5, cex.axis = 1,
col.axis = "gray20", lwd = 1.5)
polygon(den1$x, den1$y, col = col1, border ="black",lwd = 2)
polygon(den2$x, den2$y, col = col2, border ="black",lwd = 2)
text(52, 0.10, labels ="CDET", col =col1, cex = 1.25,font=2)
text(35, 0.03, labels ="SDFT", col =col2, cex = 1.25,font=2)
title(main = "Gestational Day 100/283",
xlab = "Fibril Diameter (nm)",
ylab = "density")
Of course, you could get a similar plot with less code and much easier adjustments using ggplot:
library(ggplot2)
ggplot(tidyr::pivot_longer(CDE1, 1:2), aes(value, fill = name)) +
geom_density() +
scale_fill_manual(values = c(col1, col2), labels = c("CDET", "SDFT")) +
scale_x_continuous(breaks = seq(25, 65, 5), limits = c(25, 65)) +
scale_y_continuous(breaks = seq(0, 0.14, 0.02), limits = c(0, 0.14)) +
theme_classic(base_size = 16) +
labs(title = "Gestational Day 100/283", x = "Fibril Diameter (nm)",
fill = NULL) +
theme(plot.title = element_text(hjust = 0.5))
Data used
Obviously, we don't have your data, so I had to create a reproducible approximation:
set.seed(123)
CDE1 <- data.frame(V1 = rnorm(20, 47.5, 4), V2 = rnorm(20, 44, 5))

R barplot outside region

So I've created a barplot for some data I have with the following code:
plot <- barplot(data,
beside = TRUE,
col = c("red", "blue", "green"),
space = c(0, 0.4),
width = 0.2,
xlim = c(0, 2),
ylim = c(0, 1.1),
legend = c("KNN", "MF1", "MF2"),
args.legend = list(x = 2.7, y = 1.2),
yaxt = 'n',
xpd = TRUE,
srt = 90
)
text(x = plot,
y = data + 0.05,
labels = as.character(round(data, digits = 2)),
srt = 90,
xpd = TRUE
)
I think the plot looks neat, but... It does not fit the region (I think at least). Are there any ways to keep the width of the bars in picture 1 and still show all 8 groups? My solution so far is to reduce the width of the bars as shown in picture 2.
Picture 1
Picture 2

R - Plot with two y axis, bars and points not aligned

For a customer I'm trying to do a combined barplot and lineplot (with points) with two y axis.
Problem: My bars and points are not aligned.
Background: We have several machines and are measuring their number of on/of switches and the amount of time that each machine is running. We want both information together in one plot to save space, because we have several machines.
The data is aggregated by day or hour. Here's some sample data:
date <- seq(as.Date("2016-10-01"), as.Date("2016-10-10"), "day")
counts <- c(390, 377, 444, NA, NA, NA, NA, 162, 166, 145)
runtime <- c(56.8, 59.4, 51.0, NA, NA, NA, NA, 38.5, 40.9, 43.4)
df <- data.frame(date = date, counts = counts, runtime = runtime)
Here's what I tried so far:
par(mar = c(3,4,4,4) + 0.3)
barplot(df$runtime, col = "palegreen2", border = "NA", ylab = "runtime in [%]",
ylim = c(0,100), font.lab = 2)
par(new = TRUE)
ymax <- max(df$counts, na.rm = TRUE) * 1.05
plot(df$date, df$counts, type = "n", xlab = "", ylab = "", yaxt = "n",
main = "Machine 1", ylim = c(0, ymax))
abline(v = date, col = "red", lwd = 2.5)
lines(df$date, df$counts, col = "blue", lwd = 2)
points(df$date, df$counts, pch = 19, cex = 1.5)
axis(4)
mtext("Number of switching operations", side = 4, line = 3, font = 2)
I found some inspiration for two axis here: http://robjhyndman.com/hyndsight/r-graph-with-two-y-axes/
What can I do to get bars with their middle aligned with the points of the lineplot?
The problem you are running into is the call to the second plot function after the barplot. This is shifting/resizing the plotting canvas which is causing the shift in the subsequent points.
Here is a quick work-around that just rescales the points and lines onto the barplot. It saves the barplot as an object, which stores x-axis locations for the mid-points of the bars. Then, when you plot the abline, lines and points using 'bp' as the x-axis variable, they will be correctly aligned.
ymax <- max(df$counts, na.rm = TRUE) * 1.05
par(mar=c(4.1,5.1,2.1,5.1))
bp <- barplot(df$runtime, col = "palegreen2", border = "NA", ylab = "runtime in [%]",
ylim = c(0,100), font.lab = 2, xlim=c(0.2,12), )
barplot(df$runtime, col = "palegreen2", ylab = "runtime in [%]", border="NA",
ylim = c(0,100), font.lab = 2)
abline(v = bp, col = "red", lwd = 2.5)
lines(bp, df$counts/ymax*100, col = "blue", lwd = 2)
points(bp, df$counts/ymax*100, pch = 19, cex = 1.5)
axis(4,at=c(0,20,40,60,80,100), labels=c("0","100","200","300","400","500"))
mtext("Number of switching operations", side = 4, line = 3, font = 2)
axis(1, at=bp, labels=df$date)
#emilliman: Thank you for your patience and input! Your plot is not completely correct, because the scaling of the second y-axis does not fit the points' values, but your idea helped me to find a solution!
Here's my new code:
library(plyr)
ymax <- max(df$counts, na.rm = TRUE)
ymax_up <- round_any(ymax, 100, f = ceiling)
ylab <- ymax_up/5 * c(0:5)
par(mar = c(3,4,4,4) + 0.3)
bp <- barplot(df$runtime, col = "palegreen2", border = "NA", ylab = "runtime in [%]",
ylim = c(0,100), font.lab = 2, main = "Machine 1")
abline(v = bp, col = "red", lwd = 2.5)
lines(bp, 100/ymax_up * df$counts, col = "blue", lwd = 2)
points(bp, 100/ymax_up * df$counts, pch = 19, cex = 1.5)
axis(4,at=c(0,20,40,60,80,100), labels= as.character(ylab))
mtext("Number of switching operations", side = 4, line = 3, font = 2)
xlab <- as.character(df$date, format = "%b %d")
axis(1, at=bp, labels = xlab)
abline(h = c(0,100))
(http://i.imgur.com/9YtYGSD.png)
Maybe this is helpful for others who run into this problem.

Remove 'y' label from plot in R

Does anyone know how to extract the 'y' off the y-axis while preserving the variable names in the following plot:
par(mar = c(5,7,4,2) +.01)
matrix <- matrix(rnorm(100) ,ncol = 2, nrow =6)
y <- 1:6
par(mar = c(5,7,4,2) +.01)
plot(matrix[,1], y, cex = .8, pch = 20, xlab = "Standardized Mean Differences", col = "darkblue", main = "Balance Assessment", yaxt = "n")
points(matrix[,2], y, cex = .8, pch = 20, col ="cyan")
abline(v = 0, col = "gray50", lty =2)
text(y =1:6, par("usr")[1], labels = c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6"), pos = 2, xpd = TRUE, srt = 0, cex = .8, font = 1, col = "blue")
It's minor, but it's driving me crazy. Thanks!
Just set ylab='' to remove it.

Resources