How to highlight points from and hist chart in R - r

I have some troubles with my code. I'm very very beginner in R, so I would like some help. I have a dataframe and I need to make an hist chart and then highlight some points. But I cannot understand how to find those points in my dataset. Here is and example of what I have.
x <- c("a","b","c","d","f","g","h","i","j","k")
y <- c(197421,77506,130474,18365,30470,22518,70183,15378,29747,11148)
z <- data.frame(x,y)
hist(z$y)
For example, how can I find in the hist where is the "a" and "h" value placed? and in a barplot? I tried the function points, but I cannot find the coordinates. Please let me know how could I make that . Thanks in advance.

Here is a way with dplyr and ggplot2. The approach is to cut the y variable into bins and then use summarise to create the counts and the labels.
library(dplyr)
library(ggplot2)
z %>%
mutate(bins = cut(y, seq(0, 200000, 50000))) %>%
group_by(bins) %>%
summarise(xes = paste0(x, collapse = ", "),
count = n()) %>%
ggplot() +
geom_bar(aes(x = bins, y = count), stat = "identity", color = "black", fill = "grey") +
geom_text(aes(x = bins, y = count + 0.5, label = xes)) +
xlab("y")
Here is a more complicated way that makes a plot that looks more like what hist() produces.
z2 <- z %>%
mutate(bins = cut(y, seq(0, 200000, 50000))) %>%
group_by(bins) %>%
summarise(xes = paste0(x, collapse = ", "),
count = n()) %>%
separate(bins, into = c("start", "end"), sep = ",") %>%
mutate(across(start:end, ~as.numeric(str_remove(., "\\(|\\]"))))
ggplot() +
geom_histogram(data = z, aes(x = y), breaks = seq(0, 200000, 50000),
color = "black", fill = "grey") +
geom_text(data = z2, aes(x = (start + end) / 2, y = count + 0.5, label = xes))

Related

Shading regions of a plot based on whether a condition is satisfied

I'm creating lineplots using ggplot() and geom_line() for a corridor of values that develops over time.
It may happen sometimes that the upper bound is below the lower bound (which I'll call "inversion"), and I would like to highlight regions where this happens in my plot, say by using a different background color.
Searching both Google and StackOverflow has not led me anywhere.
Here is an artificial example:
library(tidyverse)
library(RcppRoll)
set.seed(42)
N <- 100
l <- 5
a <- rgamma(n = N, shape = 2)
d <- tibble(x = 1:N, upper = roll_maxr(a, n = l), lower = roll_minr(a + lag(a), n = l)) %>% mutate(inversion = upper < lower)
dl <- pivot_longer(d, cols = c("upper", "lower"), names_to = "Bounds", values_to = "bound_vals")
ggplot(dl, mapping = aes(x = x, y = bound_vals, color = Bounds)) + geom_line(linewidth = 1) + theme_light()
This produces the following plot:
As you can see, inversion occurs in a few places, e.g. around x = 50. I would like for the plot to have a darker (say gray) background where it does, based on the inversion column already in the tibble. How can I do this?
Thank you very much for the help!
One option to achieve your desired result would be to use ggh4x::stat_difference like so. Note that to this end we have to use the wide dataset and accordingly add the lines via two geom_line.
library(ggplot2)
library(ggh4x)
ggplot(d, mapping = aes(x = x)) +
stat_difference(aes(ymin = lower, ymax = upper)) +
geom_line(aes(y = lower, color = "lower"), linewidth = 1) +
geom_line(aes(y = upper, color = "upper"), linewidth = 1) +
scale_fill_manual(values = c("+" = "transparent", "-" = "darkgrey"),
breaks = "-",
labels = "Inversion") +
theme_light() +
labs(color = "Bounds")
EDIT Of course is it also possible to draw background rects for the intersection regions. But I don't know of any out-of-the-box option, i.e. the tricky part is to compute the x values where the lines intersect which requires some effort and approximation. Here is one approach but probably not the most efficient one.
library(tidyverse)
# Compute intersection points and prepare data to draw rects
n <- 20 # Increase for a better approximation
rect <- data.frame(
x = seq(1, N, length.out = N * n)
)
# Shamefully stolen from ggh4x
rle_id <- function(x) with(rle(x), rep.int(seq_along(values), lengths))
rect <- rect |>
mutate(lower = approx(d$x, d$lower, x)[["y"]],
upper = approx(d$x, d$upper, x)[["y"]],
inversion = upper < lower,
rle = with(rle(inversion & !is.na(inversion)), rep.int(seq_along(values), lengths))
) |>
filter(inversion) |>
group_by(rle) |>
slice(c(1, n())) |>
mutate(label = c("xmin", "xmax")) |>
ungroup() |>
select(x, rle, label) |>
pivot_wider(names_from = label, values_from = x)
ggplot(dl, mapping = aes(x = x, y = bound_vals, color = Bounds)) +
geom_line(linewidth = 1) +
geom_rect(data = rect, aes(xmin = xmin, xmax = xmax, group = rle),
ymin = -Inf, ymax = Inf, fill = "darkgrey", alpha = .3, inherit.aes = FALSE) +
theme_light()
#> Warning: Removed 9 rows containing missing values (`geom_line()`).
Answering myself, the following worked for me in the end (also using actual data and plots grouped with facet_wrap()); h/t to #stefan, whose approach with geom_rect() I recycled:
library(tidyverse)
library(RcppRoll)
set.seed(42)
N <- 100
l <- 5
a <- rgamma(n = N, shape = 2)
d <- tibble(x = 1:N, upper = roll_maxr(a, n = l), lower = roll_minr(a + lag(a), n = l)) %>%
mutate(inversion = upper < lower,
inversionLag = if_else(is.na(lag(inversion)), FALSE, lag(inversion)),
inversionLead = if_else(is.na(lead(inversion)), FALSE, lead(inversion)),
inversionStart = inversion & !inversionLag,
inversionEnd = inversion & !inversionLead
)
dl <- pivot_longer(d, cols = c("upper", "lower"), names_to = "Bounds", values_to = "bound_vals")
iS <- d %>% filter(inversionStart) %>% select(x) %>% rowid_to_column() %>% rename(iS = x)
iE <- d %>% filter(inversionEnd) %>% select(x) %>% rowid_to_column() %>% rename(iE = x)
iD <- iS %>% full_join(iE, by = c("rowid"))
g <- ggplot(dl, mapping = aes(x = x, y = bound_vals, color = Bounds)) +
geom_line(linewidth = 1) +
geom_rect(data = iD, mapping = aes(xmin = iS, xmax = iE, fill = "Inversion"), ymin = -Inf, ymax = Inf, alpha = 0.3, inherit.aes = FALSE) +
scale_fill_manual(name = "Inversions", values = "darkgray") +
theme_light()
g
This gives
which is pretty much what I was after.

gganimate - have geom_rect adjust each frame

I have the following data:
library(ggplot2)
library(gganimate)
library(tidyverse)
createData<- function(vintage, id){#create data
# Generate a sequence of dates from 2010-01-01 to 2025-12-31 with a quarterly frequency
Dates <- seq(from = as.Date("2010-01-01"), to = as.Date("2025-12-31"), by = "quarter")
RLG<- cumsum(sample(c(-1, 1), 64, TRUE))
df<- data.frame( Dates,RLG, vintage,id)
return(df)
}
#createData
df<- createData("2018-01-01",1) %>%
rbind(createData("2019-01-01",2))%>%
rbind(createData("2020-01-01",3)) %>%
rbind(createData("2021-01-01",4))%>%
rbind(createData("2022-01-01",5))%>%
rbind(createData("2023-01-01",6))%>%
rbind(createData("2024-01-01",7))%>%
rbind(createData("2025-01-01",8))
Which I use to make the following chart:
options(gganimate.nframes = 8*length(unique(df$vintage)), gganimate.res = 30)
p<- ggplot(df) +
aes(x = Dates, y = RLG, group = as.Date(vintage), colour = "RLG") +
geom_line()+
scale_y_continuous(labels = \(x) paste0(x, "%"))+
theme(axis.title = element_blank(),legend.position="none")+
transition_time(id)+
exit_fade(alpha = 0.5)+
shadow_mark(alpha = 0.2)
animate(p, end_pause = 30)
I would like to add a geom_rect which goes from vintage to max(Dates). At each frame, vintage will increase, so the geom_rect will shrink slightly. How can I do this without interfering with the shadow_mark and exit_fades which I am applying to the lines?
If you mean something like a progress bar you could do it like so:
create an DF for the geom which is a subset of the original
df_geom <- df |>
mutate(vintage = as.Date(vintage)) |>
group_by(id) |>
slice(n())
Use geom_segment with the DF from above.
If you want to leave shadow_mark in you can do shadow_mark(exclude_layer = 2).
p <- ggplot(df) +
aes(x = Dates, y = RLG, group = as.Date(vintage), colour = RLG) +
geom_line()+
scale_y_continuous(labels = \(x) paste0(x, "%"))+
theme(axis.title = element_blank(),legend.position="none") +
geom_segment(
data = df_geom,
mapping = aes(x=vintage, xend=Dates,
y = 18, yend = 18),
size = 10, alpha =.4, color ='lightblue'
) +
transition_time(id)+
exit_fade(alpha = 0.5)
# shadow_mark(alpha = 0.2)
animate(p)

Simple one about Alluvial plot in R

I would like to make a simple flow graph.
Here is my code:
## Data
x = tibble(qms = c("FLOW", "FLOW"),
move1 = c("Birth", "Birth"),
move2 = c("Direct", NA),
freq = c(100, 50))
## Graph
x %>%
mutate(id = qms) %>%
to_lodes_form(axis = 2:3, id = id) %>%
na.omit() %>%
ggplot(aes(x = x, stratum = stratum, alluvium = id,
y = freq, label = stratum)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow(aes(fill = qms),stat = "alluvium") +
geom_stratum(aes(fill = stratum), show.legend=FALSE) +
geom_text(stat = "stratum", size = 3)
This is the outcome:
My desired outcome is that:
How can I express the decreasing pattern with the missing value?
By slightly reshaping your data you can get what you want. I think the key is to map the alluvium to something fixed like 1 so that it will be a single flow, and mapping stratum to the same variable as x.
library(tidyverse)
library(ggalluvial)
x <- tibble(x = c("Birth", "Direct"),
y = c(100, 50))
x %>%
ggplot(aes(x, y, alluvium = 1, stratum = x)) +
geom_alluvium() +
geom_stratum()
Created on 2022-11-15 with reprex v2.0.2

2D summary plot with counts as labels

I have measurements of a quantity (value) at specific points (lon and lat), like the example data below:
library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15),
lat = runif(1000, 40, 60),
value = rnorm(1000))
I want to make a 2D summary (e.g. mean) of the measured values with color in space and on top of that I want to show the counts as labels.
I can plot the labels and to the summary plot
## Left plot
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex")
## Right plot
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_binhex(aes(label = ..count..), bins = 5, geom = "text")
But when I combine both I loose the summary:
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
stat_binhex(aes(label = ..count..), bins = 5, geom = "text")
I can achieve the opposite, counts as color and summary as labels:
ggplot(dat, aes(lon, lat, z = value)) +
geom_hex(bins = 5) +
stat_summary_hex(aes(label=..value..), bins = 5,
fun = function(x) round(mean(x), 3),
geom = "text")
While writing the question, which took some hours of testing, I found a solution: adding a fill=NULL, or fill=mean(value) in the text one gives me what I want. Below the code and their resulting plots; the only difference is the label of the legend.
But it feels very hacky, so I would appreciate a better solution.
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
stat_binhex(aes(label = ..count.., fill = NULL), bins = 5, geom = "text") +
theme_bw()
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
stat_binhex(aes(label = ..count.., fill = mean(value)), bins = 5, geom = "text") +
theme_bw()
I propose a completely different approach to this problem. However, it needs to be clarified a bit first. You write "I have measurements of a quantity (value) at specific points (lon and lat)" but you do not specify these points exactly. Your data (generated) contains 1000 lon points and the same number of lat points.
Anyway, see for yourself.
library(tidyverse)
set.seed(1)
dat <-
tibble(
lon = runif(1000, 1, 15),
lat = runif(1000, 40, 60),
value = rnorm(1000)
)
dat %>% distinct(lon) %>% nrow() #1000
dat %>% distinct(lat) %>% nrow() #1000
My guess is that for real data you have a much smaller set of values for lon and lat.
Let me break it down to an accuracy of 2.
grid = 2
dat %>% mutate(
lon = round(lon/grid)*grid,
lat = round(lat/grid)*grid,
) %>%
group_by(lon, lat) %>%
summarise(
mean = mean(value),
label = n()
)
As you can see after rounding, the data was grouped according to these two variables and then I calculated the statistics you are interested in (mean and number of observations).
Also note that these statistics are generated at the intersection of lon and lat, so we have a square grid. In your solution, this is not the case at all. You are not getting the number of observations at these points and your grid is not square.
So let's make a graph.
dat %>% ggplot(aes(lon,lat,z=mean)) +
geom_contour_filled(binwidth = 0.25) +
geom_text(aes(label = label)) +
theme_bw()
Nothing stands in the way of increasing your grid a bit, let's say 4.
grid = 4
datg = dat %>% mutate(
lon = round(lon/grid)*grid,
lat = round(lat/grid)*grid,
) %>%
group_by(lon, lat) %>%
summarise(
mean = mean(value),
label = n()
)
datg %>% ggplot(aes(lon,lat,z=mean)) +
geom_contour_filled(binwidth = 0.25) +
geom_text(aes(label = label)) +
theme_bw()
Using such a solution, we can easily supplement the labels in the points of interest to us, e.g. with the average value. This time we will use grid = 1.5.
grid = 1.5
datg = dat %>% mutate(
lon = round(lon/grid)*grid,
lat = round(lat/grid)*grid,
) %>%
group_by(lon, lat) %>%
summarise(
mean = mean(value),
label = n(),
lab2 = paste0("(", round(mean, 2), ")")
)
datg %>% ggplot(aes(lon,lat,z=mean)) +
geom_contour_filled(binwidth = 0.25) +
geom_text(aes(label = label)) +
geom_text(aes(label = lab2), nudge_y = -.5, size = 3) +
theme_bw()
Hope this solution fits your needs much better than the stat_binhex based solution.
The problem here is that both plots share the same legend scale.
As the scales ranges are different : 0-40 vs -1.5 - 0.5, the biggest range makes values of the smallest range appear with (almost) the same color.
This is why displaying count as color works, but the opposite doesn't seem to work.
As an illustration, if you rescale the mean calculation, colors variations are visible:
rescaled_mean <- function(x) mean(x)*40
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "rescaled_mean", geom = "hex")+
stat_binhex(aes(label = ..count..), bins = 5, geom = "text") +
theme_bw()
To be fair, I find this a very strange behaviour. I like your solution though - I really don't find it very hacky to add fill = NULL. In contrary, I find this very elegant. Here a more hacky approach, basically resulting the same, but with one more line. It's using ggnewscale.
library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15),
lat = runif(1000, 40, 60),
value = rnorm(1000))
ggplot(dat) +
aes(x = lon, y = lat,z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
ggnewscale::new_scale_fill() +
stat_binhex(aes(label = ..count..), bins = 5, geom = "text")
Created on 2022-02-17 by the reprex package (v2.0.1)

ggplot facet_wrap not ordering facets alphabetically R

I've spent the past few days looking through so many forums and sites, so I hope you can help.
You can find the data I've been using here, as well as the three model predictions.
I'm predicting subjective well-being (i.e. positive affect, negative affect, and life satisfaction) from last night's person-centered sleep satisfaction. I came up with three models that I now want to plot next to each other. The problem is that facet_wrap puts the models next to each other alphabetically and not how I want them (positive affect, negative affect, and life satisfaction).
You can view my current graph here
This is my code to get the graph going:
library("afex")
library("tidyverse")
library("tidylog")
theme_set(theme_bw(base_size = 15))
library("sjPlot")
d3 <- read.csv("d3.csv")
d3 <- d3 %>%
group_by(ID) %>%
mutate(SD_person_centred = sleepDur - mean(sleepDur, na.rm = TRUE)) %>%
mutate(sleep_satisfaction_person_centred = Sleep_quality_open - mean(Sleep_quality_open, na.rm = TRUE)) %>%
mutate(MS_person_centred = mid_sleep_modified - mean(mid_sleep_modified, na.rm = TRUE)) %>%
mutate(MS_person_freeday_centred = abs(mid_sleep_modified -
mean(mid_sleep_modified[Routine_work_day_open == "No"], na.rm = TRUE))) %>%
mutate(MS_person_mctq_centred = abs(mid_sleep_modified - MCTQ_MSF_number)) %>%
mutate(sleep_onset_person_centred = Sleep_Onset_open - mean(Sleep_Onset_open, na.rm = TRUE)) %>%
mutate(sleep_efficiency_person_centred = SleepEfficiency_act - mean(SleepEfficiency_act, na.rm = TRUE)) %>%
ungroup
m_p_sls_1 <- readRDS("m_p_sls_1.rds")
m_n_sls_1 <- readRDS("m_n_sls_1.rds")
m_s_sls_1 <- readRDS("m_s_sls_1.rds")
tmp <- get_model_data(m_p_sls_1$full_model, type = "pred", terms = "sleep_satisfaction_person_centred")
tmp$DV <- "positive_affect"
tmp2 <- get_model_data(m_n_sls_1$full_model, type = "pred", terms = "sleep_satisfaction_person_centred")
tmp2$DV <- "negative_affect"
tmp3 <- get_model_data(m_s_sls_1$full_model, type = "pred", terms = "sleep_satisfaction_person_centred")
tmp3$DV <- "life_satisfaction"
tmp <- bind_rows(tmp, tmp2, tmp3)
tmp
tmp$DV
Here I change tmp$DV into a factor as this was the solution I found online. However, this did not change anything:
tmp$DV <- factor(tmp$DV, levels=c("positive_affect","negative_affect","life_satisfaction"))
levels(tmp$DV)
This is my code for the graph:
variable_names <- list(
"positive_affect" = "positive affect" ,
"negative_affect" = "negative affect",
"life_satisfaction" = "life satisfaction"
)
variable_labeller <- function(variable,value){
return(variable_names[value])
}
d3 %>%
pivot_longer(cols="positive_affect":"life_satisfaction", names_to = "DV", values_to = "Score") %>%
ggplot(aes(x = sleep_satisfaction_person_centred, y = Score)) +
geom_ribbon(data = tmp, aes(x = x, ymin = conf.low, ymax = conf.high, y = predicted),
fill = "lightgrey") +
geom_line(data = tmp, aes(x = x, y = predicted, group = 1)) +
geom_point(alpha = 0.2) +
facet_wrap(~DV, scales = "free_y",labeller=variable_labeller) +
labs(y = "Score",
x = "Sleep satisfaction person centered")
When I give the factor of tmp$DV a different name, i.e. tmp$facet and add this to my code, I do get the right order, but the scales are not free on the y-axis anymore. Please have a look here.
tmp$facet <- factor(tmp$DV, levels=c("positive_affect", "negative_affect", "life_satisfaction"))
d3 %>%
pivot_longer(cols="positive_affect":"life_satisfaction", names_to = "DV", values_to = "Score") %>%
ggplot(aes(x = sleep_satisfaction_person_centred, y = Score)) +
geom_ribbon(data = tmp, aes(x = x, ymin = conf.low, ymax = conf.high, y = predicted),
fill = "lightgrey") +
geom_line(data = tmp, aes(x = x, y = predicted, group = 1)) +
geom_point(alpha = 0.2) +
facet_wrap(~facet, scales = "free_y",labeller=variable_labeller) +
labs(y = "Score",
x = "Sleep satisfaction person centered")
When I change pivot_longer to facet in the first row, I get the same graph as the one before.
Sorry for the long post, but I tried to be as clear as possible. Please let me know if I wasn't.
I'd appreciate any kind of hints. Thanks a lot for your time.
All the best,
Anita
Just got the answer from my colleague Henrik Singmann, in case anybody was wondering:
d3 %>%
pivot_longer(cols="positive_affect":"life_satisfaction", names_to = "DV", values_to = "Score") %>%
mutate(DV = factor(DV, levels=c("positive_affect","negative_affect","life_satisfaction"))) %>%
ggplot(aes(x = sleep_satisfaction_person_centred, y = Score)) +
geom_ribbon(data = tmp, aes(x = x, ymin = conf.low, ymax = conf.high, y = predicted),
fill = "lightgrey") +
geom_line(data = tmp, aes(x = x, y = predicted, group = 1)) +
geom_point(alpha = 0.2) +
facet_wrap(~DV, scales = "free_y",labeller=variable_labeller) +
labs(y = "Score",
x = "Sleep satisfaction person centered")
So the factor needs to be defined in d3 before being handed over to ggplot.

Resources