Is there a way to first change the facet label from 1:3 to something like c(good, bad, ugly). Also, i would like to add R2 value to each of the facet. Below is my code- i tried a few things but didn't succeed.
DF = data.frame(SUB = rep(1:3, each = 100), Ob = runif(300, 50,100), S1 = runif(300, 75,95), S2 = runif(300, 40,90),
S3 = runif(300, 35,80),S4 = runif(300, 55,100))
FakeData = gather(DF, key = "Variable", value = "Value", -c(SUB,Ob))
ggplot(FakeData, aes(x = Ob, y = Value))+
geom_point()+ geom_smooth(method="lm") + facet_grid(Variable ~ SUB, scales = "free_y")+
theme_bw()
Here is the figure that i am getting using above code.
I tried below code to change the facet_label but it didn't work
ggplot(FakeData, SUB = factor(SUB, levels = c("Good", "Bad","Ugly")), aes(x = Ob, y = Value))+
geom_point()+ geom_smooth(method="lm") + facet_grid(Variable ~ SUB, scales = "free_y")+
theme_bw()
I do not have any idea how to add R2 to the facets. Is there any efficient way of computing and R2 to the facets?
You can use ggpubr::stat_cor() to easily add correlation coefficients to your plot.
library(dplyr)
library(ggplot2)
library(ggpubr)
FakeData %>%
mutate(SUB = factor(SUB, labels = c("good", "bad", "ugly"))) %>%
ggplot(aes(x = Ob, y = Value)) +
geom_point() +
geom_smooth(method = "lm") +
facet_grid(Variable ~ SUB, scales = "free_y") +
theme_bw() +
stat_cor(aes(label = after_stat(rr.label)), color = "red", geom = "label")
If you don't want to use functions from other packages and only want to use ggplot2, you will need to compute the R2 for each SUB and Variable combination, and then add to your plot with geom_text or geom_label. Here is one way to do it.
library(tidyverse)
set.seed(1)
DF = data.frame(SUB = rep(1:3, each = 100), Ob = runif(300, 50,100), S1 = runif(300, 75,95), S2 = runif(300, 40,90),
S3 = runif(300, 35,80),S4 = runif(300, 55,100))
FakeData = gather(DF, key = "Variable", value = "Value", -c(SUB,Ob))
FakeData_lm <- FakeData %>%
group_by(SUB, Variable) %>%
nest() %>%
# Fit linear model
mutate(Mod = map(data, ~lm(Value ~ Ob, data = .x))) %>%
# Get the R2
mutate(R2 = map_dbl(Mod, ~round(summary(.x)$r.squared, 3)))
ggplot(FakeData, aes(x = Ob, y = Value))+
geom_point()+
geom_smooth(method="lm") +
# Add label
geom_label(data = FakeData_lm,
aes(x = Inf, y = Inf,
label = paste("R2 = ", R2, sep = " ")),
hjust = 1, vjust = 1) +
facet_grid(Variable ~ SUB, scales = "free_y") +
theme_bw()
The following answer makes use of package 'ggpmisc' (version >= 0.5.0 for the second example). In addition, I simply used a call to factor() within aes() to set the labels.
library(tidyverse)
library(ggpmisc)
DF = data.frame(SUB = rep(1:3, each = 100), Ob = runif(300, 50,100), S1 = runif(300, 75,95), S2 = runif(300, 40,90),
S3 = runif(300, 35,80),S4 = runif(300, 55,100))
FakeData = gather(DF, key = "Variable", value = "Value", -c(SUB,Ob))
# As asked in the question
# Ensuring that the R^2 label does not overlap the observations
ggplot(FakeData, aes(x = Ob, y = Value)) +
geom_point()+
geom_smooth(method = "lm") +
stat_poly_eq() +
scale_y_continuous(expand = expansion(mult = c(0.1, 0.33))) +
facet_grid(Variable ~ factor(SUB,
levels = 1:3,
labels = c("good", "bad", "ugly")),
scales = "free_y") +
theme_bw()
# As asked in a comment, adding P-value
ggplot(FakeData, aes(x = Ob, y = Value))+
geom_point()+
geom_smooth(method = "lm") +
stat_poly_eq(mapping = use_label(c("R2", "P")), p.digits = 2) +
scale_y_continuous(expand = expansion(mult = c(0.1, 0.33))) +
facet_grid(Variable ~ factor(SUB,
levels = 1:3,
labels = c("good", "bad", "ugly")),
scales = "free_y")+
theme_bw()
And the plot from the second example adding P to the label.
Note: With older versions of 'ggpmisc' which lack function use_label() the mapping can be written as aes(label = paste(after_stat(rr.label), after_stat(p.label), sep = "*\", \"*") in the same way as when using 'ggpubr'.
Package 'ggpubr' includes code copied from 'ggpmisc' without acknowledgenment, which explains why some statistics are so similar between the two packages. 'ggpmisc' is more narrowly focused but the statistics in it have been much improved after they were taken into 'ggpubr'. 'ggpmisc' is actively maintained while maintenance of 'ggpubr' seems currently stalled.
Related
I am trying to add dollar formatting using the scales package scales::dollar_format() to a single plot (Sales) of a facet wrap plot
z <- data.frame(months = month.name, sales = runif(12, min = 3000000, max = 60000000), cases = round(runif(12, min = 100, max = 1000),0)) %>% pivot_longer(!months, names_to = "variable", values_to = "metric")
ggplot(data = z,
aes(x = months, y = metric)) +
geom_bar(stat = 'identity') +
facet_wrap(~ variable, ncol = 1, scales = "free_y")
I've tried using scale_y_continuous(labels = scales::dollar_format()) but it obviously adds it to both.
How can I add this to only the Sales plot and not the Cases plot?
There is no elegant way to do this in vanilla ggplot2. In you particular case, because your two scales have very different ranges, you can hack a solution together by giving a function that conditionally formats the labels.
library(ggplot2)
library(tidyr)
z <- data.frame(months = month.name, sales = runif(12, min = 3000000, max = 60000000), cases = round(runif(12, min = 100, max = 1000),0)) %>%
pivot_longer(!months, names_to = "variable", values_to = "metric")
p <- ggplot(data = z,
aes(x = months, y = metric)) +
geom_bar(stat = 'identity') +
facet_wrap(~ variable, ncol = 1, scales = "free_y")
p + scale_y_continuous(
labels = ~ if (any(.x > 1e3)) scales::dollar(.x) else .x
)
If your case gets more complicated however, you could use ggh4x::facetted_pos_scales() to give a particular scale to a particular panel. (Disclaimer: I'm the author of ggh4x)
p + ggh4x::facetted_pos_scales(y = list(
variable == "sales" ~ scale_y_continuous(labels = scales::dollar_format())
))
Created on 2022-03-09 by the reprex package (v2.0.1)
I'm trying to replace the facet_wrap titles on a ggplot bar plot with expressions, but I'm having no luck. I've tried here and here but neither seem to be working for me.
The whole dataset is quite large, so here's some dummy data to illustrate the problem.
library(tidyr)
library(ggplot2)
data<-data.frame(species = rep(c("oak", "elm", "ash"), each = 5),
resp_1 = (runif(15, 1,100)),
resp_2 = (runif(15, 1,100)),
resp_3 = (runif(15, 1,100)),
resp_4 = (runif(15, 1,100)),
resp_5 = (runif(15, 1,100)))
### transform to longform with tidyr
data_2 <- gather(data, response, result, resp_1:resp_5, factor_key=TRUE)
### plot with ggplot2
ggplot(data_2, aes(x = species, y = result, fill = species))+
geom_bar(stat = 'sum')+
facet_wrap(~ response)
### here are the labels I'd like to see on the facets
oxygen <-expression ("Oxygen production (kg/yr)")
runoff <-expression("Avoided runoff " ~ (m ^{3} /yr))
co <- expression("CO removal (g/yr)")
o3 <- expression("O"[3]~" removal (g/yr)")
no2 <- expression("NO"[2]~" removal (g/yr)")
labels <- c(oxygen, runoff, co, o3, no2)
### this doesn't work
ggplot(data_2, aes(x = species, y = result, fill = species))+
geom_bar(stat = 'sum')+
facet_wrap(~ response, labeller = labeller(response = labels))
### close, but doesn't work
levels(data_2$response)<-labels
ggplot(data_2, aes(x = species, y = result, fill = species))+
geom_bar(stat = 'sum')+
facet_wrap(~ response, labeller = labeller(response = labels))
### produces an error
ggplot(data_2, aes(x = species, y = result, fill = species))+
geom_bar(stat = 'sum')+
facet_wrap(~ response, labeller = label_parsed)
I'd also like to get rid of the second legend in grey titled "n".
Right now your expression names don't match up to the values used as the facets. So I'd recommend storing your labels in an expression
labels <- expression(
resp_1 = "Oxygen production (kg/yr)",
resp_2 = "Avoided runoff " ~ (m ^{3} /yr),
resp_3 = "CO removal (g/yr)",
resp_4 = "O"[3]~" removal (g/yr)",
resp_5 = "NO"[2]~" removal (g/yr)"
)
And then you can write your own labeler function to extract the correct value
ggplot(data_2, aes(x = species, y = result, fill = species))+
geom_bar(stat = 'sum', show.legend = c(size=FALSE))+
facet_wrap(~ response, labeller = function(x) {
list(as.list(labels)[x$response])
})
We've also used show.legend = c(size=FALSE) to turn off the n legend
Use as_labeller and label_parsed. Ref
library(tidyr)
library(ggplot2)
data <- data.frame(species = rep(c("oak", "elm", "ash"), each = 5),
resp_1 = (runif(15, 1, 100)),
resp_2 = (runif(15, 1, 100)),
resp_3 = (runif(15, 1, 100)),
resp_4 = (runif(15, 1, 100)),
resp_5 = (runif(15, 1, 100)))
data_2 <- gather(data, response, result, resp_1:resp_5, factor_key = TRUE)
# setup the labels
reponse_names <- c(
`resp_1` = "Oxygen~production~(kg*yr^{-1})",
`resp_2` = "Avoided~runoff~(m^{3}*yr^{-1})",
`resp_3` = "CO~removal~(g*yr^{-1})",
`resp_4` = "O[3]~removal~(g*yr^{-1})",
`resp_5` = "NO[2]~removal~(g*yr^{-1})"
)
# plot
ggplot(data_2, aes(x = species, y = result, fill = species))+
geom_bar(stat = 'sum')+
facet_wrap(
~ response,
labeller = labeller(response = as_labeller(reponse_names, label_parsed))
) +
guides(size = "none")
Created on 2021-04-30 by the reprex package (v2.0.0)
I'm trying to display the equations on the plot using the stat_poly_eq function of ggpmisc.
My problem is how to change the y= ... in the equation, by y1=... and y2=... by referring to the key argument.
I tried to add the eq.with.lhs argument in the mapping but it does not recognize the argument.
I tried to pass a vector to the eq.with.lhs argument but it overlapped both elements in each equation...
Do you have a better idea?
In the last case, I could use geom_text after calculating the equation coefficients myself, but it seemed to be a less efficient way to solve the problem.
Here is a reprex of my problem.
data <- data.frame(x = rnorm(20)) %>%
mutate(y1 = 1.2*x + rnorm(20, sd=0.2),
y2 = 0.9*x + rnorm(20, sd=0.3)) %>%
gather(value = value, key = key, -x)
ggplot(data, aes(x = x, y = value)) +
geom_point(aes(shape = key, colour = key)) +
stat_poly_eq(aes(label = ..eq.label.., colour = key),
formula = y ~ poly(x, 1, raw = TRUE),
eq.x.rhs = "x",
# eq.with.lhs = c(paste0(expression(y[1]), "~`=`~"),
# paste0(expression(y[2]), "~`=`~")),
eq.with.lhs = paste0(expression(y[ind]), "~`=`~"),
parse = TRUE) +
ylab(NULL)
I'm not really sure if it's possible to do it through ggpmisc, but you can change the data once the plot is built, like so:
library(tidyverse)
library(ggpmisc)
data <- data.frame(x = rnorm(20)) %>%
mutate(y1 = 1.2*x + rnorm(20, sd=0.2),
y2 = 0.9*x + rnorm(20, sd=0.3)) %>%
gather(value = value, key = key, -x)
p <- ggplot(data, aes(x = x, y = value)) +
geom_point(aes(shape = key, colour = key)) +
stat_poly_eq(aes(label = ..eq.label.., colour = key),
formula = y ~ poly(x, 1, raw = TRUE),
eq.x.rhs = "x",
eq.with.lhs = paste0(expression(y), "~`=`~"),
parse = TRUE) +
ylab(NULL)
temp <- ggplot_build(p)
temp$data[[2]]$label <- temp$data[[2]]$label %>%
fct_relabel(~ str_replace(.x, "y", paste0("y[", 1:2, "]")))
grid::grid.newpage()
grid::grid.draw(ggplot_gtable(temp))
Based on the example here
Adding Regression Line Equation and R2 on graph, I am struggling to include the regression line equation for my model in each facet. However, I don't figure why is changing the limits of my x axis.
library(ggplot2)
library(reshape2)
df <- data.frame(year = seq(1979,2010), M02 = runif(32,-4,6),
M06 = runif(32, -2.4, 5.1), M07 = runif(32, -2, 7.1))
df <- melt(df, id = c("year"))
ggplot(data = df, mapping = aes(x = year, y = value)) +
geom_point() +
scale_x_continuous() +
stat_smooth_func(geom = 'text', method = 'lm', hjust = 0, parse = T) +
geom_smooth(method = 'lm', se = T) +
facet_wrap(~ variable) # as you can see, the scale_x_axis goes back to 1800
If I include on the x the limits,
scale_x_continuous(limits = c(1979,2010))
it does not show the regression coefficient anymore. What am I doing wrong here?
stat_smooth_func available here: https://gist.github.com/kdauria/524eade46135f6348140
You can use stat_poly_eq function from the ggpmisc package.
library(reshape2)
library(ggplot2)
library(ggpmisc)
#> For news about 'ggpmisc', please, see https://www.r4photobiology.info/
#> For on-line documentation see https://docs.r4photobiology.info/ggpmisc/
df <- data.frame(year = seq(1979,2010), M02 = runif(32,-4,6),
M06 = runif(32, -2.4, 5.1), M07 = runif(32, -2, 7.1))
df <- melt(df, id = c("year"))
formula1 <- y ~ x
ggplot(data = df, mapping = aes(x = year, y = value)) +
geom_point() +
scale_x_continuous() +
geom_smooth(method = 'lm', se = TRUE) +
stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")),
label.x = "left", label.y = "top",
formula = formula1, parse = TRUE, size = 3) +
facet_wrap(~ variable)
ggplot(data = df, mapping = aes(x = year, y = value)) +
geom_point() +
scale_x_continuous() +
geom_smooth(method = 'lm', se = TRUE) +
stat_poly_eq(aes(label = paste(..eq.label.., sep = "~~~")),
label.x = "left", label.y = 0.15,
eq.with.lhs = "italic(hat(y))~`=`~",
eq.x.rhs = "~italic(x)",
formula = formula1, parse = TRUE, size = 4) +
stat_poly_eq(aes(label = paste(..rr.label.., sep = "~~~")),
label.x = "left", label.y = "bottom",
formula = formula1, parse = TRUE, size = 4) +
facet_wrap(~ variable)
Created on 2019-01-10 by the reprex package (v0.2.1.9000)
Probably someone will suggest a better solution, but as an alternative, you can change stat_smooth_func and you can make the final row like this
data.frame(x=1979, y=ypos, label=func_string)
instead of
data.frame(x=xpos, y=ypos, label=func_string)
So, the plot will be like below
I've been trying to superimpose a normal curve over my histogram with ggplot 2.
My formula:
data <- read.csv (path...)
ggplot(data, aes(V2)) +
geom_histogram(alpha=0.3, fill='white', colour='black', binwidth=.04)
I tried several things:
+ stat_function(fun=dnorm)
....didn't change anything
+ stat_density(geom = "line", colour = "red")
...gave me a straight red line on the x-axis.
+ geom_density()
doesn't work for me because I want to keep my frequency values on the y-axis, and want no density values.
Any suggestions?
Solution found!
+geom_density(aes(y=0.045*..count..), colour="black", adjust=4)
Think I got it:
library(ggplot2)
set.seed(1)
df <- data.frame(PF = 10*rnorm(1000))
ggplot(df, aes(x = PF)) +
geom_histogram(aes(y =..density..),
breaks = seq(-50, 50, by = 10),
colour = "black",
fill = "white") +
stat_function(fun = dnorm, args = list(mean = mean(df$PF), sd = sd(df$PF)))
This has been answered here and partially here.
The area under a density curve equals 1, and the area under the histogram equals the width of the bars times the sum of their height ie. the binwidth times the total number of non-missing observations. To fit both on the same graph, one or other needs to be rescaled so that their areas match.
If you want the y-axis to have frequency counts, there are a number of options:
First simulate some data.
library(ggplot2)
set.seed(1)
dat_hist <- data.frame(
group = c(rep("A", 200), rep("B",150)),
value = c(rnorm(200, 20, 5), rnorm(150,25,10)))
# Set desired binwidth and number of non-missing obs
bw = 2
n_obs = sum(!is.na(dat_hist$value))
Option 1: Plot both histogram and density curve as density and then rescale the y axis
This is perhaps the easiest approach for a single histogram.
Using the approach suggested by Carlos, plot both histogram and density curve as density
g <- ggplot(dat_hist, aes(value)) +
geom_histogram(aes(y = ..density..), binwidth = bw, colour = "black") +
stat_function(fun = dnorm, args = list(mean = mean(dat_hist$value), sd = sd(dat_hist$value)))
And then rescale the y axis.
ybreaks = seq(0,50,5)
## On primary axis
g + scale_y_continuous("Counts", breaks = round(ybreaks / (bw * n_obs),3), labels = ybreaks)
## Or on secondary axis
g + scale_y_continuous("Density", sec.axis = sec_axis(
trans = ~ . * bw * n_obs, name = "Counts", breaks = ybreaks))
Option 2: Rescale the density curve using stat_function
With code tidied as per PatrickT's answer.
ggplot(dat_hist, aes(value)) +
geom_histogram(colour = "black", binwidth = bw) +
stat_function(fun = function(x)
dnorm(x, mean = mean(dat_hist$value), sd = sd(dat_hist$value)) * bw * n_obs)
Option 3: Create an external dataset and plot using geom_line.
Unlike the above options, this one works with facets. (EDITED to provide dplyr rather than plyr based solution). Note, the summarised dataset is being used as the primary, and the raw passed in for the histogram only.
library(tidyverse)
dat_hist %>%
group_by(group) %>%
nest(data = c(value)) %>%
mutate(y = map(data, ~ dnorm(
.$value, mean = mean(.$value), sd = sd(.$value)
) * bw * sum(!is.na(.$value)))) %>%
unnest(c(data,y)) %>%
ggplot(aes(x = value)) +
geom_histogram(data = dat_hist, binwidth = bw, colour = "black") +
geom_line(aes(y = y)) +
facet_wrap(~ group)
Option 4: Create external functions to edit the data on the fly
A bit over the top perhaps, but might be useful for someone?
## Function to create scaled dnorm data along full x axis range
dnorm_scaled <- function(data, x = NULL, binwidth = 1, xlim = NULL) {
.x <- na.omit(data[,x])
if(is.null(xlim))
xlim = c(min(.x), max(.x))
x_range = seq(xlim[1], xlim[2], length.out = 101)
setNames(
data.frame(
x = x_range,
y = dnorm(x_range, mean = mean(.x), sd = sd(.x)) * length(.x) * binwidth),
c(x, "y"))
}
## Function to apply over groups
dnorm_scaled_group <- function(data, x = NULL, group = NULL, binwidth = NULL, xlim = NULL) {
dat_hists <- lapply(
split(data, data[, group]), dnorm_scaled,
x = x, binwidth = binwidth, xlim = xlim)
for(g in names(dat_hists))
dat_hists[[g]][, "group"] <- g
setNames(do.call(rbind, dat_hists), c(x, "y", group))
}
## Single histogram
ggplot(dat_hist, aes(value)) +
geom_histogram(binwidth = bw, colour = "black") +
geom_line(data = ~ dnorm_scaled(., "value", binwidth = bw),
aes(y = y))
## With a single faceting variable
ggplot(dat_hist, aes(value)) +
geom_histogram(binwidth = 2, colour = "black") +
geom_line(data = ~ dnorm_scaled_group(
., x = "value", group = "group", binwidth = 2, xlim = c(0,50)),
aes(y = y)) +
facet_wrap(~ group)
This is an extended comment on JWilliman's answer. I found J's answer very useful. While playing around I discovered a way to simplify the code. I'm not saying it is a better way, but I thought I would mention it.
Note that JWilliman's answer provides the count on the y-axis and a "hack" to scale the corresponding density normal approximation (which otherwise would cover a total area of 1 and have therefore a much lower peak).
Main point of this comment: simpler syntax inside stat_function, by passing the needed parameters to the aesthetics function, e.g.
aes(x = x, mean = 0, sd = 1, binwidth = 0.3, n = 1000)
This avoids having to pass args = to stat_function and is therefore more user-friendly. Okay, it's not very different, but hopefully someone will find it interesting.
# parameters that will be passed to ``stat_function``
n = 1000
mean = 0
sd = 1
binwidth = 0.3 # passed to geom_histogram and stat_function
set.seed(1)
df <- data.frame(x = rnorm(n, mean, sd))
ggplot(df, aes(x = x, mean = mean, sd = sd, binwidth = binwidth, n = n)) +
theme_bw() +
geom_histogram(binwidth = binwidth,
colour = "white", fill = "cornflowerblue", size = 0.1) +
stat_function(fun = function(x) dnorm(x, mean = mean, sd = sd) * n * binwidth,
color = "darkred", size = 1)
This code should do it:
set.seed(1)
z <- rnorm(1000)
qplot(z, geom = "blank") +
geom_histogram(aes(y = ..density..)) +
stat_density(geom = "line", aes(colour = "bla")) +
stat_function(fun = dnorm, aes(x = z, colour = "blabla")) +
scale_colour_manual(name = "", values = c("red", "green"),
breaks = c("bla", "blabla"),
labels = c("kernel_est", "norm_curv")) +
theme(legend.position = "bottom", legend.direction = "horizontal")
Note: I used qplot but you can use the more versatile ggplot.
Here's a tidyverse informed version:
Setup
library(tidyverse)
Some data
d <- read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/openintro/speed_gender_height.csv")
Preparing data
We'll use a "total" histogram for the whole sample, to that end, we'll need to remove the grouping information from the data.
d2 <-
d |>
select(-gender)
Here's a data set with summary data:
d_summary <-
d %>%
group_by(gender) %>%
summarise(height_m = mean(height, na.rm = T),
height_sd = sd(height, na.rm = T))
d_summary
Plot it
d %>%
ggplot() +
aes() +
geom_histogram(aes(y = ..density.., x = height, fill = gender)) +
facet_wrap(~ gender) +
geom_histogram(data = d2, aes(y = ..density.., x = height),
alpha = .5) +
stat_function(data = d_summary %>% filter(gender == "female"),
fun = dnorm,
#color = "red",
args = list(mean = filter(d_summary,
gender == "female")$height_m,
sd = filter(d_summary,
gender == "female")$height_sd)) +
stat_function(data = d_summary %>% filter(gender == "male"),
fun = dnorm,
#color = "red",
args = list(mean = filter(d_summary,
gender == "male")$height_m,
sd = filter(d_summary,
gender == "male")$height_sd)) +
theme(legend.position = "none",
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
labs(title = "Facetted histograms with overlaid normal curves",
caption = "The grey histograms shows the whole distribution (over) both groups, i.e. females and men") +
scale_fill_brewer(type = "qual", palette = "Set1")