I'm trying to create a figure similar to the one below (taken from Ro, Russell, & Lavie, 2001). In their graph, they are plotting bars for the errors (i.e., accuracy) within the reaction time bars. Basically, what I am looking for is a way to plot bars within bars.
I know there are several challenges with creating a graph like this. First, Hadley points out that it is not possible to create a graph with two scales in ggplot2 because those graphs are fundamentally flawed (see Plot with 2 y axes, one y axis on the left, and another y axis on the right)
Nonetheless, the graph with superimposed bars seems to solve this dual sclaing problem, and I'm trying to figure out a way to create it in R. Any help would be appreciated.
It's fairly easy in base R, by using par(new = T) to add to an existing graph
set.seed(54321) # for reproducibility
data.1 <- sample(1000:2000, 10)
data.2 <- sample(seq(0, 5, 0.1), 10)
# Use xpd = F to avoid plotting the bars below the axis
barplot(data.1, las = 1, col = "black", ylim = c(500, 3000), xpd = F)
par(new = T)
# Plot the new data with a different ylim, but don't plot the axis
barplot(data.2, las = 1, col = "white", ylim = c(0, 30), yaxt = "n")
# Add the axis on the right
axis(4, las = 1)
It is pretty easy to make the bars in ggplot. Here is some example code. No two y-axes though (although look here for a way to do that too).
library(ggplot2)
data.1 <- sample(1000:2000, 10)
data.2 <- sample(500:1000, 10)
library(ggplot2)
ggplot(mapping = aes(x, y)) +
geom_bar(data = data.frame(x = 1:10, y = data.1), width = 0.8, stat = 'identity') +
geom_bar(data = data.frame(x = 1:10, y = data.2), width = 0.4, stat = 'identity', fill = 'white') +
theme_classic() + scale_y_continuous(expand = c(0, 0))
Related
I cannot figure out how to get the percentage of responses at the end of the bars. I know I'm missing something within the text() function, just not sure what exactly I'm missing. Thank you!
#Training/Specialty Barplot
trainbarplot <- barplot(table(PSR$training), horiz = TRUE,
main="Respondent Distribution of Training", cex.main = 1.1, font.main = 2,
cex.lab = 0.8, cex.names = 0.4, font.axis = 4, las = 2,
xlab="Response Frequency", xlim=c(0, 40), cex.axis = 0.8,
border="black",
col=rgb (0.1, 0.1, 0.4, 0.5, 0.6),
density=c(50,40,30) , angle=c(9,11,36)
)
text(trainbarplot, table(PSR$training) - 3,
labels=paste(round(proportions(table(PSR$training))*100, 0), "%"))
Generate data
I generated some sample data to replicate your problem. Please note that you should always try to provide an example dataset :)
set.seed(123)
df1 <- data.frame(x = rnorm(10, mean=10, sd=2), y = LETTERS[1:20])
Plot the data
Here's a plot that follows the same structure as your code:
bp <- barplot(df1$x, names.arg = df1$y, col = df1$colour, horiz = T)
text(x= df1$x+0.5, y= bp, labels=paste0(round(df1$x),"%"), xpd=TRUE)
Using ggplot2
You can also plot your data using ggplot2. For instance, you could first create a new column in your dataset with information on the labels...
df1$perc <- paste0(round(df1$x),"%")
Next, you can plot your data using ggplot and adding different relevant layers.
library(ggplot2)
ggplot(df1, aes(x = x, y = y)) +
geom_col() +
geom_text(aes(label = perc)) +
theme_minimal()
Good luck!
I am trying to arrange two ggplot2 plots side by side, i.e., in a two-column
layout using the package gridExtra. I am interested in ensuring that both
plots have equal plotting area (i.e., the gray plot panel is the same for both
plots) regardless of the height of the x-axis labels. As you can see in the
example below, when longer x-axis labels are used, gridExtra::grid.arrange()
seems to compensate this by adjusting the plotting area (i.e., the grayed out
part of the plot).
# Dummy data.
data <- data.frame(x = 1:10, y = rnorm(10))
# Dummy labels.
x_labels_short <- 1:10
x_labels_long <- 100001:100010
# Common settings for both `ggplot2` plots.
layers <- list(
labs(
x = "Plot title"
),
theme(
axis.text.x = element_text(
angle = 90,
vjust = 0.5,
hjust = 1
)
)
)
# `ggplot2 plot (a).
plot_a <- ggplot(data, aes(x, y)) +
scale_x_continuous(breaks = 1:10, labels = x_labels_short) +
layers
# `ggplo2` plot (b).
plot_b <- ggplot(data, aes(x, y)) +
scale_x_continuous(breaks = 1:10, labels = x_labels_long) +
layers
# Showing the plots side by side.
gridExtra::grid.arrange(
plot_a,
plot_b,
ncol = 2
)
Output:
What I want is for both plots to (1) have equal plotting area and (b) the x-axis
title of plot_a to be aligned with that of plot_b (i.e., the x-axis title of
plot_a to be offset based on the length of of the x-axis labels of plot_b).
If this is not clear, this is what I want to achieve would look like with base
R.
# Wrapper for convenience.
plot_gen <- function(data, labels) {
plot(
NULL,
xlim = c(1, 10),
ylim = c(min(data$y), max(data$y)),
xlab = "",
ylab = "y",
xaxt = "n"
)
axis(
side = 1,
at = 1:10,
labels = labels,
las = 2
)
title(
xlab = "Plot title",
line = 4.5
)
}
# Get `par`.
old_par = par(no.readonly = TRUE)
# Set the two-column layout.
layout(matrix(1:2, ncol = 2))
# Adjust margins.
par(mar = old_par$mar + c(1.5, 0, 0, 0))
# Plot base plot one.
plot_gen(data, x_labels_short)
# Plot base plot two.
plot_gen(data, x_labels_long)
# Restore `par`.
par(old_par)
# Restore layout.
layout(1:1)
Output:
Quick mention. I found a similar question on SO (i.e.,
How to specify the size of a graph in ggplot2 independent of axis labels), however I fail to see how the
answers address the problem. Also, the plots I am trying to arrange are based
on different data and I don't think I can use a facet_wrap approach.
One suggestion: the patchwork package.
library(patchwork)
plot_a + plot_b
It also works for more complex layouts, e.g.:
(plot_a | plot_b) / plot_a
I need to map my Erosion values for different levels of tillage (colomns) with three levels of soil depth (rows (A1, A2, A3)). I want all of this to be shown as a barchart in a single plot.
Here is a reproducible example:
a<- matrix(c(1:36), byrow = T, ncol = 4)
rownames(a)<-(c("A1","B1","C1","A2","B2","C2","A3","B3","C3"))
colnames(a)<-(c("Int_till", "Redu_till", "mulch_till", "no_till"))
barplot(a[1,], xlab = "A1", ylab = "Erosion")
barplot(a[4,], xlab = "A2", ylab = "Erosion")
barplot(a[7,], xlab = "A3", ylab = "Erosion")
##I want these three barchart side by side in a single plot
## for comparison
### and need similar plots for all the "Bs" and "Cs"
### Lastly, i want these three plots in the same page.
I have seen people do similar things using "fill" in ggplot (for lines) and specifying the factor which nicely separates the chart for different categories but I tried doing it but always run into error maybe because my data is continuous.
If any body could help me with these two things.. It will be a great help. I will appreciate it.
Thank you!
We can use ggplot
library(reshape2)
library(ggplot2)
library(dplyr)
melt(a) %>%
ggplot(., aes(x = Var2, y = value, fill = Var1)) +
geom_bar(stat = 'identity',
position = position_dodge2(preserve = "single")) +
facet_wrap(~ Var1)
Set mfcol to specify a 3x3 grid and then for each row generate its bar plot. Also, you could consider adding the barplot argument ylim = c(0, max(a)) so that all graphs use the same Y axis. title and mtext can be used to set the overall title and various margin text as we do below. See ?par, ?title and ?mtext for more information.
opar <- par(mfcol = c(3, 3), oma = c(0, 3, 0, 0))
for(r in rownames(a)) barplot(a[r, ], xlab = r, ylab = "Erosion")
par(opar)
title("My Plots", outer = TRUE, line = -1)
mtext(LETTERS[1:3], side = 2, outer = TRUE, line = -1,
at = c(0.85, 0.5, 0.17), las = 2)
I have been struggling with rescaling the loadings (arrows) length in a ggplot2/ggfortify PCA. I have looked around extensively for an answer to this, and the only information I have found either code new biplot functions or refer to other entirely different packages for PCA (ggbiplot, factoextra), neither of which address the question I would like to answer:
Is it possible to scale/change size of PCA loadings in ggfortify?
Below is the code I have to plot a PCA using stock R functions as well as the code to plot a PCA using autoplot/ggfortify. You'll notice in the stock R plots I can scale the loads by simply multiplying by a scalar (*20 here) so my arrows aren't cramped in the middle of the PCA plot. Using autoplot...not so much. What am I missing? I'll move to another package if necessary but would really like to have a better understanding of ggfortify.
On other sites I have found, the graph axes limits never seem to exceed +/- 2. My graph goes +/- 20, and the loadings sit staunchly near 0, presumably at the same scale as graphs with smaller axes. I would still like to plot PCA using ggplot2, but if ggfortify won't do it then I need to find another package that will.
#load data geology rocks frame
georoc <- read.csv("http://people.ucsc.edu/~mclapham/earth125/data/georoc.csv")
#load libraries
library(ggplot2)
library(ggfortify)
geo.na <- na.omit(georoc) #remove NA values
geo_matrix <- as.matrix(geo.na[,3:29]) #create matrix of continuous data in data frame
pca.res <- prcomp(geo_matrix, scale = T) #perform PCA using correlation matrix (scale = T)
summary(pca.res) #return summary of PCA
#plotting in stock R
plot(pca.res$x, col = c("salmon","olivedrab","cadetblue3","purple")[geo.na$rock.type], pch = 16, cex = 0.2)
#make legend
legend("topleft", c("Andesite","Basalt","Dacite","Rhyolite"),
col = c("salmon","olivedrab","cadetblue3","purple"), pch = 16, bty = "n")
#add loadings and text
arrows(0, 0, pca.res$rotation[,1]*20, pca.res$rotation[,2]*20, length = 0.1)
text(pca.res$rotation[,1]*22, pca.res$rotation[,2]*22, rownames(pca.res$rotation), cex = 0.7)
#plotting PCA
autoplot(pca.res, data = geo.na, colour = "rock.type", #plot results, name using original data frame
loadings = T, loadings.colour = "black", loadings.label = T,
loadings.label.colour = "black")
The data comes from an online file from a class I'm taking, so you could just copy this if you have the ggplot2 and ggfortify packages installed. Graphs below.
R plot of what I want ggplot to look like
What ggplot actually looks like
Edit:
Adding reproducible code below.
iris.res <-
iris %>%
select(Sepal.Length:Petal.Width) %>%
as.matrix(.) %>%
prcomp(., scale = F)
autoplot(iris.res, data = iris, size = 4, col = "Species", shape = "Species",
x = 1, y = 2, #components 1 and 2
loadings = T, loadings.colour = "grey50", loadings.label = T,
loadings.label.colour = "grey50", loadings.label.repel = T) + #loadings are arrows
geom_vline(xintercept = 0, lty = 2) +
geom_hline(yintercept = 0, lty = 2) +
theme(aspect.ratio = 1) +
theme_bw()
This answer is probably long after the OP needs it, but I'm offering it because I have been wrestling with the same issue for a while, and maybe I can save someone else the same effort.
# Load data
iris <- data.frame(iris)
# Do PCA
PCA <- prcomp(iris[,1:4])
# Extract PC axes for plotting
PCAvalues <- data.frame(Species = iris$Species, PCA$x)
# Extract loadings of the variables
PCAloadings <- data.frame(Variables = rownames(PCA$rotation), PCA$rotation)
# Plot
ggplot(PCAvalues, aes(x = PC1, y = PC2, colour = Species)) +
geom_segment(data = PCAloadings, aes(x = 0, y = 0, xend = (PC1*5),
yend = (PC2*5)), arrow = arrow(length = unit(1/2, "picas")),
color = "black") +
geom_point(size = 3) +
annotate("text", x = (PCAloadings$PC1*5), y = (PCAloadings$PC2*5),
label = PCAloadings$Variables)
In order to increase the arrow length, multiply the loadings for the xend and yend in the geom_segment call. With a bit of trial and effort, can work out what number to use.
To place the labels in the correct place, multiply the PC axes by the same value in the annotate call.
in R, with ecdf I can plot a empirical cumulative distribution function
plot(ecdf(mydata))
and with hist I can plot a histogram of my data
hist(mydata)
How I can plot the histogram and the ecdf in the same plot?
EDIT
I try make something like that
https://mathematica.stackexchange.com/questions/18723/how-do-i-overlay-a-histogram-with-a-plot-of-cdf
Also a bit late, here's another solution that extends #Christoph 's Solution with a second y-Axis.
par(mar = c(5,5,2,5))
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
par(new = T)
ec <- ecdf(dt)
plot(x = h$mids, y=ec(h$mids)*max(h$counts), col = rgb(0,0,0,alpha=0), axes=F, xlab=NA, ylab=NA)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
axis(4, at=seq(from = 0, to = max(h$counts), length.out = 11), labels=seq(0, 1, 0.1), col = 'red', col.axis = 'red')
mtext(side = 4, line = 3, 'Cumulative Density', col = 'red')
The trick is the following: You don't add a line to your plot, but plot another plot on top, that's why we need par(new = T). Then you have to add the y-axis later on (otherwise it will be plotted over the y-axis on the left).
Credits go here (#tim_yates Answer) and there.
There are two ways to go about this. One is to ignore the different scales and use relative frequency in your histogram. This results in a harder to read histogram. The second way is to alter the scale of one or the other element.
I suspect this question will soon become interesting to you, particularly #hadley 's answer.
ggplot2 single scale
Here is a solution in ggplot2. I am not sure you will be satisfied with the outcome though because the CDF and histograms (count or relative) are on quite different visual scales. Note this solution has the data in a dataframe called mydata with the desired variable in x.
library(ggplot2)
set.seed(27272)
mydata <- data.frame(x= rexp(333, rate=4) + rnorm(333))
ggplot(mydata, aes(x)) +
stat_ecdf(color="red") +
geom_bar(aes(y = (..count..)/sum(..count..)))
base R multi scale
Here I will rescale the empirical CDF so that instead of a max value of 1, its maximum value is whatever bin has the highest relative frequency.
h <- hist(mydata$x, freq=F)
ec <- ecdf(mydata$x)
lines(x = knots(ec),
y=(1:length(mydata$x))/length(mydata$x) * max(h$density),
col ='red')
you can try a ggplot approach with a second axis
set.seed(15)
a <- rnorm(500, 50, 10)
# calculate ecdf with binsize 30
binsize=30
df <- tibble(x=seq(min(a), max(a), diff(range(a))/binsize)) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(a))
# plot
ggplot() +
geom_histogram(aes(a), bins = binsize) +
geom_line(data = df, aes(x=x, y=Ecdf_scaled), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(a), name = "Ecdf"))
Edit
Since the scaling was wrong I added a second solution, calculatin everything in advance:
binsize=30
a_range= floor(range(a)) +c(0,1)
b <- seq(a_range[1], a_range[2], round(diff(a_range)/binsize)) %>% floor()
df_hist <- tibble(a) %>%
mutate(gr = cut(a,b, labels = floor(b[-1]), include.lowest = T, right = T)) %>%
count(gr) %>%
mutate(gr = as.character(gr) %>% as.numeric())
# calculate ecdf with binsize 30
df <- tibble(x=b) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(df_hist$n))
ggplot(df_hist, aes(gr, n)) +
geom_col(width = 2, color = "white") +
geom_line(data = df, aes(x=x, y=Ecdf*max(df_hist$n)), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(df_hist$n), name = "Ecdf"))
As already pointed out, this is problematic because the plots you want to merge have such different y-scales. You can try
set.seed(15)
mydata<-runif(50)
hist(mydata, freq=F)
lines(ecdf(mydata))
to get
Although a bit late... Another version which is working with preset bins:
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
ec <- ecdf(dt)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
lines(x = c(0,100), y=c(1,1)*max(h$counts), col ='red', lty = 3) # indicates 100%
lines(x = c(which.min(abs(ec(h$mids) - 0.9)), which.min(abs(ec(h$mids) - 0.9))), # indicates where 90% is reached
y = c(0, max(h$counts)), col ='black', lty = 3)
(Only the second y-axis is not working yet...)
In addition to previous answers, I wanted to have ggplot do the tedious calculation (in contrast to #Roman's solution, which was kindly enough updated upon my request), i.e., calculate and draw the histogram and calculate and overlay the ECDF. I came up with the following (pseudo code):
# 1. Prepare the plot
plot <- ggplot() + geom_hist(...)
# 2. Get the max value of Y axis as calculated in the previous step
maxPlotY <- max(ggplot_build(plot)$data[[1]]$y)
# 3. Overlay scaled ECDF and add secondary axis
plot +
stat_ecdf(aes(y=..y..*maxPlotY)) +
scale_y_continuous(name = "Density", sec.axis = sec_axis(trans = ~./maxPlotY, name = "ECDF"))
This way you don't need to calculate everything beforehand and feed the results to ggpplot. Just lay back and let it do everything for you!