How To Center Axes in ggplot2 - r

In the following plot, which is a simple scatter plot + theme_apa(), I would like that both axes go through 0.
I tried some of the solutions proposed in the answers to similar questions to that but none of them worked.
A MWE to reproduce the plot:
library(papaja)
library(ggplot2)
library(MASS)
plot_two_factor <- function(factor_sol, groups) {
the_df <- as.data.frame(factor_sol)
the_df$groups <- groups
p1 <- ggplot(data = the_df, aes(x = MR1, y = MR2, color = groups)) +
geom_point() + theme_apa()
}
set.seed(131340)
n <- 30
group1 <- mvrnorm(n, mu=c(0,0.6), Sigma = diag(c(0.01,0.01)))
group2 <- mvrnorm(n, mu=c(0.6,0), Sigma = diag(c(0.01,0.01)))
factor_sol <- rbind(group1, group2)
colnames(factor_sol) <- c("MR1", "MR2")
groups <- as.factor(rep(c(1,2), each = n))
print(plot_two_factor(factor_sol, groups))
The papaja package can be installed via
devtools::install_github("crsh/papaja")

What you request cannot be achieved in ggplot2 and for a good reason, if you include axis and tick labels within the plotting area they will sooner or later overlap with points or lines representing data. I used #phiggins and #Job Nmadu answers as a starting point. I changed the order of the geoms to make sure the "data" are plotted on top of the axes. I changed the theme to theme_minimal() so that axes are not drawn outside the plotting area. I modified the offsets used for the data to better demonstrate how the code works.
library(ggplot2)
iris %>%
ggplot(aes(Sepal.Length - 5, Sepal.Width - 2, col = Species)) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_point() +
theme_minimal()
This gets as close as possible to answering the question using ggplot2.
Using package 'ggpmisc' we can slightly simplify the code.
library(ggpmisc)
iris %>%
ggplot(aes(Sepal.Length - 5, Sepal.Width - 2, col = Species)) +
geom_quadrant_lines(linetype = "solid") +
geom_point() +
theme_minimal()
This code produces exactly the same plot as shown above.
If you want to always have the origin centered, i.e., symmetrical plus and minus limits in the plots irrespective of the data range, then package 'ggpmisc' provides a simple solution with function symmetric_limits(). This is how quadrant plots for gene expression and similar bidirectional responses are usually drawn.
iris %>%
ggplot(aes(Sepal.Length - 5, Sepal.Width - 2, col = Species)) +
geom_quadrant_lines(linetype = "solid") +
geom_point() +
scale_x_continuous(limits = symmetric_limits) +
scale_y_continuous(limits = symmetric_limits) +
theme_minimal()
The grid can be removed from the plotting area by adding + theme(panel.grid = element_blank()) after theme_minimal() to any of the three examples.
Loading 'ggpmisc' just for function symmetric_limits() is overkill, so here I show its definition, which is extremely simple:
symmetric_limits <- function (x)
{
max <- max(abs(x))
c(-max, max)
}

For the record, the following also works as above.
iris %>%
ggplot(aes(Sepal.Length-6.2, Sepal.Width-3.2, col = Species)) +
geom_point() +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0)

Setting xlim and slim should work.
library(tidyverse)
# default
iris %>%
ggplot(aes(Sepal.Length, Sepal.Width, col = Species)) +
geom_point()
# setting xlim and ylim
iris %>%
ggplot(aes(Sepal.Length, Sepal.Width, col = Species)) +
geom_point() +
xlim(c(0,8)) +
ylim(c(0,4.5))
Created on 2020-06-12 by the reprex package (v0.3.0)

While the question is not very clear, PoGibas seems to think that this is what the OP wanted.
library(tidyverse)
# default
iris %>%
ggplot(aes(Sepal.Length-6.2, Sepal.Width-3.2, col = Species)) +
geom_point() +
xlim(c(-2.5,2.5)) +
ylim(c(-1.5,1.5)) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0)
Created on 2020-06-12 by the reprex package (v0.3.0)

Related

How to change the limits from scale_y_continuous depending on the plot in R?

I want to draw boxplots with the number of observations on top. The problem is that depending on the information and the outliers, the y-axis changes. For that reason, I want to change the limits of scale_y_continuous automatically. Is it possible to do this?
This is a reproducible example:
library(dplyr)
library(ggplot2)
myFreqs <- mtcars %>%
group_by(cyl, am) %>%
summarise(Freq = n())
myFreqs
p <- ggplot(mtcars, aes(factor(cyl), drat, fill=factor(am))) +
stat_boxplot(geom = "errorbar") +
geom_boxplot() +
stat_summary(geom = 'text', label = paste("n = ", myFreqs$Freq), fun = max, position = position_dodge(width = 0.77), vjust=-1)
p
The idea is to increase at least +1 to the maximum value of the plot with the highest y-axis value (in the case explained above, it would be the second boxplot with n=8)
I have tried to change the y-axis with scale_y_continuous like this:
p <- p + scale_y_continuous(limits = c(0, 5.3))
p
However, I don't want to put the limits myself, I want to find a way to modify the limits according to the plots that I have. (Because... what if the information changes?).
Is there a way to do something like this? With min and max --> scale_y_continuous(limits = c(min(x), max(x)))
Thanks very much in advance
Thanks to #teunbrand and #caldwellst I got the solution that I needed it.
There are 3 solutions that work perfectly:
1-
p + scale_y_continuous(limits = function(x){
c(min(x), (max(x)+0.1))
})
p
2-
library(tidyverse)
p + scale_y_continuous(limits = ~ c(min(.x), max(.x) + 0.1))
3-
p + scale_y_continuous(limits = function(x){
c(min(x), ceiling(max(x) * 1.1))
})

Is there a way to add a legend to refer to residuals in a effect plot?

I'm looking to add a legend to my plot, for the moment the code I wrote is:
plot(allEffects(covid.lm, residuals=T), # plot with countries on graph
band.colors="grey2",
residuals.color=adjustcolor("steelblue3",alpha.f=0.5),
residuals.pch=16, smooth.residuals=F,
id = list(n=length(d$COUNTRY), cex=0.5))
Basically, I added numbers to the points in the plot (for which I created a linear model covid.lm, that done I'd need to add a legend for those points (that is a list of countries). Thanks in advance.
library(ggplot2)
data(iris)
m <- lm(Petal.Length ~ Sepal.Length, data = iris)
iris$Fitted <- predict(m)
iris$Species_num <- as.numeric(iris$Species)
ggplot(iris, aes(x = Petal.Length, y = Fitted)) +
geom_point(aes(color = as.factor(Species_num))) +
geom_text(aes(label = Species_num), hjust = 1.1, vjust = 1.1) +
labs(title = "Residuals", x = "Observed", y = "Fitted") +
guides(color=guide_legend(title="New Legend Title"))
Created on 2022-04-08 by the reprex package (v2.0.1)

Colour points for x-axis plot R

I want to give the points relating to the first 130 x-axis values a different colour than the rest (up to 250). So basically, divide the points vertically with two different colours. Is this possible and how would you go about it?
Welcome to SO!
I would use ggplot2
Here are some examples:
library(ggplot2)
ggplot(mtcars,aes(hp,mpg,color = mpg < 20)) +
geom_point()
ggplot(mtcars,aes(hp,mpg,color = mpg < 20)) +
geom_point() +
theme(legend.position = 'none')
ggplot(mtcars,aes(hp,mpg,color = mpg < 20)) +
geom_point() +
labs(color = 'mpg less than 20')
ggplot(mtcars,aes(hp,mpg,color = mpg < 20)) +
geom_point() +
scale_color_manual(values = c('purple4','springgreen4'))
Good luck!
You can use the row_number for the colours.
library(ggplot2)
library(dplyr)
data(mpg)
mpg %>%
mutate(colour=row_number(displ)<=130) %>%
ggplot(aes(x=displ, y=cty, col=colour)) +
geom_point(show.legend=FALSE) + theme_bw()
And seems there is a tie at about 3.5.
using base R you can try
plot(iris$Sepal.Length, iris$Sepal.Width, col = rep(1:2, times = c(130, nrow(iris)-130)))

ggplot add geom (specifically geom_hline) which doesn't affect limits

I have some data that I would like to plot a threshold on, only if the data approaches the threshold. Therefore I would like to have a horizontal line at my threshold, but not extend the y axis limits if this value wouldn't have already been included. As my data is faceted it is not feasible to pre-calculate limits and I am doing it for many different data sets so would get very messy. This question seems to be asking the same thing but the answers are not relevant to me: ggplot2: Adding a geom without affecting limits
Simple example.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.5.3
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length))+geom_point()+facet_wrap(~Species, scales = "free")+geom_hline(yintercept = 7)
which gives me
But I would like this (created in paint) where the limits have not been impacted by the geom_hline
Created on 2020-01-21 by the reprex package (v0.3.0)
You can automate this by checking whether a given facet has a maximum y-value that exceeds the threshold.
threshold = 7
iris %>%
ggplot(aes(Sepal.Width, Sepal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free") +
geom_hline(data = . %>%
group_by(Species) %>%
filter(max(Sepal.Length, na.rm=TRUE) >= threshold),
yintercept = threshold)
Adapting from this post:
How can I add a line to one of the facets?
library(tidyverse)
iris %>%
ggplot(aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free") +
geom_hline(data = . %>% filter(Species != "setosa"), aes(yintercept = 7))

How to apply separate coord_cartesian() to "zoom in" into individual panels of a facet_grid()?

Inspired by the Q Finding the elbow/knee in a curve I started to play around with smooth.spline().
In particular, I want to visualize how the parameter df (degree of freedom) influences the approximation and the first and second derivative. Note that this Q is not about approximation but about a specific problem (or edge case) in visualisation with ggplot2.
First attempt: simple facet_grid()
library(ggplot2)
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
dp is a data.table containing the data points for which an approximation is sought and ap is a data.table with the approximated data plus the derivatives (data are given below).
For each row, facet_grid() with scales = "free_y" has choosen a scale which displays all data. Unfortunately, one panel has kind of "outliers" which make it difficult to see details in the other panels. So, I want to "zoom in".
"Zoom in" using coord_cartesian()
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 50))
With the manually selected range, more details in the panels of row 3 have been made visible. But, the limit has been applied to all panels of the grid. So, in row 1 details hardly can been distinguished.
What I'm looking for is a way to apply coord_cartesian() with specific parameters separately to each individual panel (or group of panels, e.g., rowwise) of the grid. For instance, is it possible to manipulate the ggplot object afterwards?
Workaround: Combine individual plots with cowplot
As a workaround, we can create three separate plots and combine them afterwards using the cowplot package:
g0 <- ggplot(ap[deriv == 0], aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
g1 <- ggplot(ap[deriv == 1], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-50, 50))
g2 <- ggplot(ap[deriv == 2], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 100))
cowplot::plot_grid(g0, g1, g2, ncol = 1, align = "v")
Unfortunately, this solution
requires to write code to create three separate plots,
duplicates strips and axes and adds whitespace which isn't available for display of the data.
Is facet_wrap() an alternative?
We can use facet_wrap() instead of facet_grid():
ggplot(ap, aes(x, y)) +
# geom_point(data = dp, alpha = 0.2) + # this line causes error message
geom_line() +
facet_wrap(~ deriv + df, scales = "free_y", labeller = label_both, nrow = 3) +
theme_bw()
Now, the y-axes of every panel are scaled individually exhibiting details of some of the panels. Unfortunately, we still can't "zoom in" into the bottom right panel because using coord_cartesian() would affect all panels.
In addition, the line
geom_point(data = dp, alpha = 0.2)
strangely causes
Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", :
only 'grobs' allowed in "gList"
I had to comment this line out, so the the data points which are to be approximated are not displayed.
Data
library(data.table)
# data points
dp <- data.table(
x = c(6.6260, 6.6234, 6.6206, 6.6008, 6.5568, 6.4953, 6.4441, 6.2186,
6.0942, 5.8833, 5.7020, 5.4361, 5.0501, 4.7440, 4.1598, 3.9318,
3.4479, 3.3462, 3.1080, 2.8468, 2.3365, 2.1574, 1.8990, 1.5644,
1.3072, 1.1579, 0.95783, 0.82376, 0.67734, 0.34578, 0.27116, 0.058285),
y = 1:32,
deriv = 0)
# approximated data points and derivatives
ap <- rbindlist(
lapply(seq(2, length(dp$x), length.out = 4),
function(df) {
rbindlist(
lapply(0:2,
function(deriv) {
result <- as.data.table(
predict(smooth.spline(dp$x, dp$y, df = df), deriv = deriv))
result[, c("df", "deriv") := list(df, deriv)]
})
)
})
)
Late answer, but the following hack just occurred to me. Would it work for your use case?
Step 1. Create an alternative version of the intended plot, limiting the range of y values such that scales = "free_y" gives a desired scale range for each facet row. Also create the intended facet plot with the full data range:
library(ggplot2)
library(dplyr)
# alternate plot version with truncated data range
p.alt <- ap %>%
group_by(deriv) %>%
mutate(upper = quantile(y, 0.75),
lower = quantile(y, 0.25),
IQR.multiplier = (upper - lower) * 10) %>%
ungroup() %>%
mutate(is.outlier = y < lower - IQR.multiplier | y > upper + IQR.multiplier) %>%
mutate(y = ifelse(is.outlier, NA, y)) %>%
ggplot(aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
# intended plot version with full data range
p <- p.alt %+% ap
Step 2. Use ggplot_build() to generate plot data for both ggplot objects. Apply the panel parameters of the alt version onto the intended version:
p <- ggplot_build(p)
p.alt <- ggplot_build(p.alt)
p$layout$panel_params <- p.alt$layout$panel_params
rm(p.alt)
Step 3. Build the intended plot from the modified plot data, & plot the result:
p <- ggplot_gtable(p)
grid::grid.draw(p)
Note: in this example, I truncated the data range by setting all values more than 10*IQR away from the upper / lower quartile in each facet row as NA. This can be replaced by any other logic for defining outliers.

Resources