Difference between geom_col() and geom_point() for same value - r

So, I'm trying to plot missing values here over time (longitudinal data).
I would prefer placing them in a geom_col() to fill up with colours of certain treatments afterwards. But for some weird reason, geom_col() gives me weird values, while geom_point() gives me the correct values using the same function. I'm trying to wrap my head around why this is happening. Take a look at the y-axis.
Disclaimer:
I know the missing values dissappear on day 19-20. This is why I'm making the plot.
Sorry about the lay-out of the plot. Not polished yet.
For the geom_point:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_point()
Picture: geom_point
For the geom_col:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_col()
Picture: geom_col

The problem is that you're using mutate and creating several rows for your groups. You cannot see that, but you will have plenty of points overlapping in your geom_point plot.
One way is to either use summarise, or you use distinct
Compare
library(tidyverse)
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_point()
The points look ugly because there is a lot of over plotting.
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
distinct(order, .keep_all = TRUE) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
Created on 2021-06-02 by the reprex package (v2.0.0)

So after some digging:
What happens was that the geom_col() function sums up all the missing values while geom_point() does not. Hence the large values for y. Why this is happening, I do not know. However doing the following worked fine for me:
gaussian_transformed$time <- as.factor(gaussian_transformed$time)
gaussian_transformed %>% group_by(time) %>% summarise(missing = sum(is.na(Rose_width))) -> gaussian_transformed
gaussian_transformed %>% ggplot(aes(x = time, y = missing)) + geom_col(fill = "blue", alpha = 0.5) + theme_minimal() + labs(title = "Missing values in Gaussian Outcome over the days", x = "Time (in days)", y = "Amount of missing values") + scale_y_continuous(breaks = seq(0, 10, 1))
With the plot: GaussianMissing

Related

How to stack partially matched time periods with geom_area (ggplot2)?

With the following example, I get a plot where the areas are not stacked. I would like to stack them. This should be a partial stack, intensity starting at 0.5, then reaching 0.8 where stacked, then reaching 0.3 at the end.
I assume that the position argument does not work as the start and end date are not the same.
Am I missing an argument that could solve this issue? Or maybe another geom?
Do I have to subset the data into days, to get the desired output. If so, how can I acheive that?
Thanks in advance,
# Library
library(tidyverse)
library(lubridate)
# Data
df <- tibble(date_debut = as_date(c("2022-09-28", "2022-10-05")),
intensity = c(0.5, 0.3),
duration = days(c(14, 10)),
type = (c("a", "b")))
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
pivot_longer(cols = c(date_debut, date_fin),
names_to = "date_type",
values_to = "date")
# Plot
df %>%
ggplot(aes(x = date, y = intensity, fill = type))+
geom_area(position = "stack")
This is a tough data wrangling problem. The area plots only stack where the points in the two series have the same x values. The following will achieve that, though it's quite a profligate approach.
df %>%
mutate(interval = interval(date_debut, date_debut + duration)) %>%
group_by(type) %>%
summarize(time = seq(as.POSIXct(min(df$date_debut)),
as.POSIXct(max(df$date_debut + df$duration)), by = 'min'),
intensity = ifelse(time %within% interval, intensity, 0)) %>%
ggplot(aes(x = time, y = intensity, fill = type)) +
geom_area(position = position_stack())
Allan Cameron's answer inspired me to look further into complete.
The proposed answer was solving my question, so I accepted. However, it is indeed more complex than needed.
I solved it this way:
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
group_by(type) %>%
complete(date_debut = seq(min(date_debut), max(date_fin), by = "1 day")) %>%
fill(intensity) %>%
select(date_debut, intensity, type)
ggplot(df, aes(x = date_debut, y = intensity, fill = type)) +
geom_area()+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
To avoid the weird empty space, it is fine for me to use geom_col (the question was about geom_area, so no worries).
ggplot(df, aes(x = date_debut, y = intensity, fill = type, colour = type)) +
geom_col(width = 0.95)+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")

x-axis starting value for diverging plot

How can I change the "x-axis starting value" from the diverging bar chart below (extracted from here), so that the vertical axis is set at 25 instead of 0. And therefore the bars are drawn from 25 and not 0.
For instance, I want this chart:
To look like this:
EDIT
It it not the label I want to change, it is how the data is plotted. My apologies if I wasn't clear. See example below:
Another example to make it clear:
You can provide computed labels to an (x-)scale via scale_x_continuous(labels = function (x) x + 25).
If you also want to change the data, you’ll first need to offset the x-values by the equivalent amount (in the opposite direction):
Example:
df = tibble(Color = c('red', 'green', 'blue'), Divergence = c(5, 10, -5))
offset = 2
df %>%
mutate(Divergence = Divergence - offset) %>%
ggplot() +
aes(x = Divergence, y = Color) +
geom_col() +
scale_x_continuous(labels = function (x) x + offset)
I'm still not 100% clear on your intended outcome but you can "shift" your data by adding/subtracting 25 from each value, e.g.
Original plot:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
subtract 25:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
If you combine that with my original relabelling I think that's the solution:
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change - 25)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(breaks = c(-25, 0, 25, 50),
labels = c(0, 25, 50, 75))
The answers that existed at the time that I'm writing this are suggesting to change the data or to change the label. Here, I'm proposing to change neither the data nor the labels, and instead just change where the starting position of a bar is.
First, for reproducibility, I took #jared_mamrot's approach for the data subset.
library(gapminder)
library(tidyverse)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
Then, you can set xmin = after_scale(25). You'll get a warning that xmin doesn't exists, but it does exist after the bars are reparameterised to rectangles in the ggplot2 internals (which is after the x-scale has seen the data to determine limits). This effectively changes the position where bars start.
ggplot(gapminder_subset,
aes(gdp_change, country)) +
geom_col(aes(xmin = after_scale(25)))
#> Warning: Ignoring unknown aesthetics: xmin
Created on 2021-06-28 by the reprex package (v1.0.0)

Display `dplyr` groups ordered by their corresponding value (not name) in `geom_point`

Using the mpg dataset I want to produce a scatterplot that shows for every manufacturer one point with the grouped (by manufacturer) mean of displ.
The following works so far:
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ))) +
geom_point(aes(x = manufacturer, y = mean_displ)) +
guides(x = guide_axis(angle = 90))
Now I want to show the points in ascending order according to their displ value. Or: I want to sort the manufacturer variable on the x-axis according to the corresponding mean_displ value.
I tried to insert a arrange(mean_displ) statement in my dplyr chain. No success.
So I introduced a dummy variable x that produces the plot I want, but now the labeling is gone..
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ)) %>%
arrange(mean_displ) %>%
mutate(x = 1:15)) +
geom_point(aes(x = x, y = mean_displ))
How can I get the later plot but with the labeling from above?
fct_reorder from the forcats package can order the levels of a factor.
library(tidyverse)
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ))) +
geom_point(aes(x = fct_reorder(manufacturer, mean_displ), y = mean_displ)) +
guides(x = guide_axis(angle = 90))

Add ylim to geom_col

I would like to see the y-axis (in the plot is flipped) starting at some arbitrary value, like 7.5
After a little bit of researching, I came across ylim, but in this case is giving me some
errors:
Scale for 'y' is already present. Adding another scale for 'y', which will
replace the existing scale.
Warning message:
Removed 10 rows containing missing values (geom_col).
This is my code, and a way to download the data I'm using:
install.packages("remotes")
remotes::install_github("tweed1e/werfriends")
library(werfriends)
friends_raw <- werfriends::friends_episodes
library(tidytext)
library(tidyverse)
#"best" writers with at least 10 episodes
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
ylim(7.5,10)
You should use coord_cartesian for zoom in a particular location (here the official documentation: https://ggplot2.tidyverse.org/reference/coord_cartesian.html).
With your example, your code should be something like that:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
coord_cartesian(ylim = c(7.5,10))
If this is not working please provide a reproducible example of your dataset (see: How to make a great R reproducible example)
I found out the solution. With my actual plot, the answer submitted by #dc37 didn't work because coord_flip() and coord_cartesian() exclude each other. So the way to do this is:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(mean_rating) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
theme(legend.position = "None") +
coord_flip(ylim = c(8,8.8))

Annotate facet plot with grouped variables

I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...
You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))
One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:
Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)

Resources