Related
This question already has answers here:
Adding minor tick marks to the x axis in ggplot2 (with no labels)
(4 answers)
Closed last month.
This question has been raised a number of times on StackOverflow over the years (see here and here), however I'm yet to come across a way that I'm satisfied with for easily adding unlabelled minor ticks to my ggplot axes.
Let's generate some dummy data to play around with:
df <- data.frame(x = rnorm(1000, mean = 25, sd = 5),
y = rnorm(1000, mean = 23, sd = 3))
There are two methods I've come across for adding unlabelled minor ticks.
Method 1 - Manually construct axis label vectors
Concatenate the values that you would like to appear at major ticks with empty spaces defined using "". If you would like to add just one unlabelled minor tick in-between major tick values, you can construct the vector of axis labels like so:
axis_values <- c(0, "", 10, "", 20, "", 30, "", 40, "", 50)
Or if you'd like n unlabelled minor ticks:
# Where n = 2 and for an axis range [0, 50]
axis_values <- c(0, rep("", 2), 15, rep("", 2), 30, rep("", 2), 45, "")
The user can then supply this vector to the 'labels' argument in the ggplot2::scale_x_continuous or ggplot2::scale_y_continuous functions as long as the length of the vector of labels matches the length of the vector supplied to the 'breaks' argument in the same functions.
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(breaks = seq(0, 50, 5), labels = axis_values, limits = c(0, 50)) +
scale_y_continuous(breaks = seq(0, 50, 5), labels = axis_values, limits = c(0, 50))
Method 2 - Define your own function for generating axis label vectors
This post describes a function to which the user can supply a vector of values to appear at major ticks, along with the number of unlabelled minor ticks desired:
insert_minor <- function(major_labs, n_minor) {
labs <- c( sapply( major_labs, function(x) c(x, rep("", n_minor) ) ) )
labs[1:(length(labs)-n_minor)]
}
# Generate plot
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(breaks = seq(0, 50, 5), labels = insert_minor(major_labs = seq(0, 50, 10),
n_minor = 1), limits = c(0, 50)) +
scale_y_continuous(breaks = seq(0, 50, 5), labels = insert_minor(major_labs = seq(0, 50, 10),
n_minor = 1), limits = c(0, 50))
Method 2 is the best way of generating unlabelled minor ticks I've seen yet. However drawbacks are:
Not dummy-proof - Users need to make sure that the value given to the 'n_minor' argument is compatible with the data supplied to the 'breaks' and 'major_labs' arguments. Call me lazy, but I don't want to think about this when I'm trying to produce plots quickly.
Function management required - When you want to use this function in another script, you have to
retrieve it from the last script you used it in, or alternatively perhaps you can package it up in a library to call in future scripts.
In my eyes, the ideal solution is for the ggplot2 developers to add an argument to scale_x_continuous or scale_y_continuous ggplot2 functions that takes a user-defined value for the number of unlabelled minor ticks the user would like to add to their plot axes, which then takes the vector supplied to the 'breaks' argument and determines 'major_labs' in the background out of the user's sight.
Has anyone else found any other way of computing unlabelled minor ticks in ggplot2?
A quick, simple, and kinda sleek solution would be to define this one-liner labelling function that only shows breaks that occur at your chosen multiples:
label_at <- function(n) function(x) ifelse(x %% n == 0, x, "")
So you could do:
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(breaks = seq(0, 50, 5), labels = label_at(10),
limits = c(0, 50)) +
scale_y_continuous(breaks = seq(0, 50, 5), labels = label_at(5),
limits = c(0, 50))
Which you can easily take to extremes:
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(breaks = 1:50, labels = label_at(10), limits = c(0, 50)) +
scale_y_continuous(breaks = 1:50, labels = label_at(10), limits = c(0, 50))
I'm trying to convert my ggplot to a plotly plot using ggplotly(). However, it doesn't seem to work on this code, after manipulate is acted on the plot. Is there any other way to do it?
library(ggplot2)
library(manipulate)
grades <- data.frame(Final = 20 * runif(70))
myFinalsPlot <- function(sliderInput, initialIndex, finalIndex) {
ggplot(data.frame(grades$Final[initialIndex:finalIndex]),
aes(x = grades$Final[initialIndex:finalIndex])) +
geom_histogram(aes(y = ..density..),
binwidth = sliderInput, colour = "green", fill = "yellow") +
geom_density(alpha = 0.2, fill = "#FF6666") +
labs(x = "Marks", y = "Grades")
}
myFinalsPlot <- manipulate(myFinalsPlot(slidersInput, 1, 70),
slidersInput = slider(1, 12, step = 1, initial = 5))
First, to make your code work with the ggplot2 plot, there is an issue in your code that you need to fix. You shouldn't give the same name to your function and plot object. Replace this:
myFinalsPlot <- manipulate(myFinalsPlot(slidersInput, 1, 70),
slidersInput = slider(1, 12, step = 1, initial = 5))
By, e.g.:
myPlot <- manipulate(myFinalsPlot(slidersInput, 1, 70),
slidersInput = slider(1, 12, step = 1, initial = 5))
Now, regarding plotly plots, I don't think it is supposed to work with manipulate. I quote RStudio's website https://support.rstudio.com/hc/en-us/articles/200551906-Interactive-Plotting-with-Manipulate:
RStudio works with the manipulate package to add interactive capabilities to standard R plots.
I have been struggling with rescaling the loadings (arrows) length in a ggplot2/ggfortify PCA. I have looked around extensively for an answer to this, and the only information I have found either code new biplot functions or refer to other entirely different packages for PCA (ggbiplot, factoextra), neither of which address the question I would like to answer:
Is it possible to scale/change size of PCA loadings in ggfortify?
Below is the code I have to plot a PCA using stock R functions as well as the code to plot a PCA using autoplot/ggfortify. You'll notice in the stock R plots I can scale the loads by simply multiplying by a scalar (*20 here) so my arrows aren't cramped in the middle of the PCA plot. Using autoplot...not so much. What am I missing? I'll move to another package if necessary but would really like to have a better understanding of ggfortify.
On other sites I have found, the graph axes limits never seem to exceed +/- 2. My graph goes +/- 20, and the loadings sit staunchly near 0, presumably at the same scale as graphs with smaller axes. I would still like to plot PCA using ggplot2, but if ggfortify won't do it then I need to find another package that will.
#load data geology rocks frame
georoc <- read.csv("http://people.ucsc.edu/~mclapham/earth125/data/georoc.csv")
#load libraries
library(ggplot2)
library(ggfortify)
geo.na <- na.omit(georoc) #remove NA values
geo_matrix <- as.matrix(geo.na[,3:29]) #create matrix of continuous data in data frame
pca.res <- prcomp(geo_matrix, scale = T) #perform PCA using correlation matrix (scale = T)
summary(pca.res) #return summary of PCA
#plotting in stock R
plot(pca.res$x, col = c("salmon","olivedrab","cadetblue3","purple")[geo.na$rock.type], pch = 16, cex = 0.2)
#make legend
legend("topleft", c("Andesite","Basalt","Dacite","Rhyolite"),
col = c("salmon","olivedrab","cadetblue3","purple"), pch = 16, bty = "n")
#add loadings and text
arrows(0, 0, pca.res$rotation[,1]*20, pca.res$rotation[,2]*20, length = 0.1)
text(pca.res$rotation[,1]*22, pca.res$rotation[,2]*22, rownames(pca.res$rotation), cex = 0.7)
#plotting PCA
autoplot(pca.res, data = geo.na, colour = "rock.type", #plot results, name using original data frame
loadings = T, loadings.colour = "black", loadings.label = T,
loadings.label.colour = "black")
The data comes from an online file from a class I'm taking, so you could just copy this if you have the ggplot2 and ggfortify packages installed. Graphs below.
R plot of what I want ggplot to look like
What ggplot actually looks like
Edit:
Adding reproducible code below.
iris.res <-
iris %>%
select(Sepal.Length:Petal.Width) %>%
as.matrix(.) %>%
prcomp(., scale = F)
autoplot(iris.res, data = iris, size = 4, col = "Species", shape = "Species",
x = 1, y = 2, #components 1 and 2
loadings = T, loadings.colour = "grey50", loadings.label = T,
loadings.label.colour = "grey50", loadings.label.repel = T) + #loadings are arrows
geom_vline(xintercept = 0, lty = 2) +
geom_hline(yintercept = 0, lty = 2) +
theme(aspect.ratio = 1) +
theme_bw()
This answer is probably long after the OP needs it, but I'm offering it because I have been wrestling with the same issue for a while, and maybe I can save someone else the same effort.
# Load data
iris <- data.frame(iris)
# Do PCA
PCA <- prcomp(iris[,1:4])
# Extract PC axes for plotting
PCAvalues <- data.frame(Species = iris$Species, PCA$x)
# Extract loadings of the variables
PCAloadings <- data.frame(Variables = rownames(PCA$rotation), PCA$rotation)
# Plot
ggplot(PCAvalues, aes(x = PC1, y = PC2, colour = Species)) +
geom_segment(data = PCAloadings, aes(x = 0, y = 0, xend = (PC1*5),
yend = (PC2*5)), arrow = arrow(length = unit(1/2, "picas")),
color = "black") +
geom_point(size = 3) +
annotate("text", x = (PCAloadings$PC1*5), y = (PCAloadings$PC2*5),
label = PCAloadings$Variables)
In order to increase the arrow length, multiply the loadings for the xend and yend in the geom_segment call. With a bit of trial and effort, can work out what number to use.
To place the labels in the correct place, multiply the PC axes by the same value in the annotate call.
I'm trying to create a figure similar to the one below (taken from Ro, Russell, & Lavie, 2001). In their graph, they are plotting bars for the errors (i.e., accuracy) within the reaction time bars. Basically, what I am looking for is a way to plot bars within bars.
I know there are several challenges with creating a graph like this. First, Hadley points out that it is not possible to create a graph with two scales in ggplot2 because those graphs are fundamentally flawed (see Plot with 2 y axes, one y axis on the left, and another y axis on the right)
Nonetheless, the graph with superimposed bars seems to solve this dual sclaing problem, and I'm trying to figure out a way to create it in R. Any help would be appreciated.
It's fairly easy in base R, by using par(new = T) to add to an existing graph
set.seed(54321) # for reproducibility
data.1 <- sample(1000:2000, 10)
data.2 <- sample(seq(0, 5, 0.1), 10)
# Use xpd = F to avoid plotting the bars below the axis
barplot(data.1, las = 1, col = "black", ylim = c(500, 3000), xpd = F)
par(new = T)
# Plot the new data with a different ylim, but don't plot the axis
barplot(data.2, las = 1, col = "white", ylim = c(0, 30), yaxt = "n")
# Add the axis on the right
axis(4, las = 1)
It is pretty easy to make the bars in ggplot. Here is some example code. No two y-axes though (although look here for a way to do that too).
library(ggplot2)
data.1 <- sample(1000:2000, 10)
data.2 <- sample(500:1000, 10)
library(ggplot2)
ggplot(mapping = aes(x, y)) +
geom_bar(data = data.frame(x = 1:10, y = data.1), width = 0.8, stat = 'identity') +
geom_bar(data = data.frame(x = 1:10, y = data.2), width = 0.4, stat = 'identity', fill = 'white') +
theme_classic() + scale_y_continuous(expand = c(0, 0))
in R, with ecdf I can plot a empirical cumulative distribution function
plot(ecdf(mydata))
and with hist I can plot a histogram of my data
hist(mydata)
How I can plot the histogram and the ecdf in the same plot?
EDIT
I try make something like that
https://mathematica.stackexchange.com/questions/18723/how-do-i-overlay-a-histogram-with-a-plot-of-cdf
Also a bit late, here's another solution that extends #Christoph 's Solution with a second y-Axis.
par(mar = c(5,5,2,5))
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
par(new = T)
ec <- ecdf(dt)
plot(x = h$mids, y=ec(h$mids)*max(h$counts), col = rgb(0,0,0,alpha=0), axes=F, xlab=NA, ylab=NA)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
axis(4, at=seq(from = 0, to = max(h$counts), length.out = 11), labels=seq(0, 1, 0.1), col = 'red', col.axis = 'red')
mtext(side = 4, line = 3, 'Cumulative Density', col = 'red')
The trick is the following: You don't add a line to your plot, but plot another plot on top, that's why we need par(new = T). Then you have to add the y-axis later on (otherwise it will be plotted over the y-axis on the left).
Credits go here (#tim_yates Answer) and there.
There are two ways to go about this. One is to ignore the different scales and use relative frequency in your histogram. This results in a harder to read histogram. The second way is to alter the scale of one or the other element.
I suspect this question will soon become interesting to you, particularly #hadley 's answer.
ggplot2 single scale
Here is a solution in ggplot2. I am not sure you will be satisfied with the outcome though because the CDF and histograms (count or relative) are on quite different visual scales. Note this solution has the data in a dataframe called mydata with the desired variable in x.
library(ggplot2)
set.seed(27272)
mydata <- data.frame(x= rexp(333, rate=4) + rnorm(333))
ggplot(mydata, aes(x)) +
stat_ecdf(color="red") +
geom_bar(aes(y = (..count..)/sum(..count..)))
base R multi scale
Here I will rescale the empirical CDF so that instead of a max value of 1, its maximum value is whatever bin has the highest relative frequency.
h <- hist(mydata$x, freq=F)
ec <- ecdf(mydata$x)
lines(x = knots(ec),
y=(1:length(mydata$x))/length(mydata$x) * max(h$density),
col ='red')
you can try a ggplot approach with a second axis
set.seed(15)
a <- rnorm(500, 50, 10)
# calculate ecdf with binsize 30
binsize=30
df <- tibble(x=seq(min(a), max(a), diff(range(a))/binsize)) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(a))
# plot
ggplot() +
geom_histogram(aes(a), bins = binsize) +
geom_line(data = df, aes(x=x, y=Ecdf_scaled), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(a), name = "Ecdf"))
Edit
Since the scaling was wrong I added a second solution, calculatin everything in advance:
binsize=30
a_range= floor(range(a)) +c(0,1)
b <- seq(a_range[1], a_range[2], round(diff(a_range)/binsize)) %>% floor()
df_hist <- tibble(a) %>%
mutate(gr = cut(a,b, labels = floor(b[-1]), include.lowest = T, right = T)) %>%
count(gr) %>%
mutate(gr = as.character(gr) %>% as.numeric())
# calculate ecdf with binsize 30
df <- tibble(x=b) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(df_hist$n))
ggplot(df_hist, aes(gr, n)) +
geom_col(width = 2, color = "white") +
geom_line(data = df, aes(x=x, y=Ecdf*max(df_hist$n)), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(df_hist$n), name = "Ecdf"))
As already pointed out, this is problematic because the plots you want to merge have such different y-scales. You can try
set.seed(15)
mydata<-runif(50)
hist(mydata, freq=F)
lines(ecdf(mydata))
to get
Although a bit late... Another version which is working with preset bins:
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
ec <- ecdf(dt)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
lines(x = c(0,100), y=c(1,1)*max(h$counts), col ='red', lty = 3) # indicates 100%
lines(x = c(which.min(abs(ec(h$mids) - 0.9)), which.min(abs(ec(h$mids) - 0.9))), # indicates where 90% is reached
y = c(0, max(h$counts)), col ='black', lty = 3)
(Only the second y-axis is not working yet...)
In addition to previous answers, I wanted to have ggplot do the tedious calculation (in contrast to #Roman's solution, which was kindly enough updated upon my request), i.e., calculate and draw the histogram and calculate and overlay the ECDF. I came up with the following (pseudo code):
# 1. Prepare the plot
plot <- ggplot() + geom_hist(...)
# 2. Get the max value of Y axis as calculated in the previous step
maxPlotY <- max(ggplot_build(plot)$data[[1]]$y)
# 3. Overlay scaled ECDF and add secondary axis
plot +
stat_ecdf(aes(y=..y..*maxPlotY)) +
scale_y_continuous(name = "Density", sec.axis = sec_axis(trans = ~./maxPlotY, name = "ECDF"))
This way you don't need to calculate everything beforehand and feed the results to ggpplot. Just lay back and let it do everything for you!