x-axis starting value for diverging plot - r

How can I change the "x-axis starting value" from the diverging bar chart below (extracted from here), so that the vertical axis is set at 25 instead of 0. And therefore the bars are drawn from 25 and not 0.
For instance, I want this chart:
To look like this:
EDIT
It it not the label I want to change, it is how the data is plotted. My apologies if I wasn't clear. See example below:
Another example to make it clear:

You can provide computed labels to an (x-)scale via scale_x_continuous(labels = function (x) x + 25).
If you also want to change the data, you’ll first need to offset the x-values by the equivalent amount (in the opposite direction):
Example:
df = tibble(Color = c('red', 'green', 'blue'), Divergence = c(5, 10, -5))
offset = 2
df %>%
mutate(Divergence = Divergence - offset) %>%
ggplot() +
aes(x = Divergence, y = Color) +
geom_col() +
scale_x_continuous(labels = function (x) x + offset)

I'm still not 100% clear on your intended outcome but you can "shift" your data by adding/subtracting 25 from each value, e.g.
Original plot:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
subtract 25:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
If you combine that with my original relabelling I think that's the solution:
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change - 25)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(breaks = c(-25, 0, 25, 50),
labels = c(0, 25, 50, 75))

The answers that existed at the time that I'm writing this are suggesting to change the data or to change the label. Here, I'm proposing to change neither the data nor the labels, and instead just change where the starting position of a bar is.
First, for reproducibility, I took #jared_mamrot's approach for the data subset.
library(gapminder)
library(tidyverse)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
Then, you can set xmin = after_scale(25). You'll get a warning that xmin doesn't exists, but it does exist after the bars are reparameterised to rectangles in the ggplot2 internals (which is after the x-scale has seen the data to determine limits). This effectively changes the position where bars start.
ggplot(gapminder_subset,
aes(gdp_change, country)) +
geom_col(aes(xmin = after_scale(25)))
#> Warning: Ignoring unknown aesthetics: xmin
Created on 2021-06-28 by the reprex package (v1.0.0)

Related

Difference between geom_col() and geom_point() for same value

So, I'm trying to plot missing values here over time (longitudinal data).
I would prefer placing them in a geom_col() to fill up with colours of certain treatments afterwards. But for some weird reason, geom_col() gives me weird values, while geom_point() gives me the correct values using the same function. I'm trying to wrap my head around why this is happening. Take a look at the y-axis.
Disclaimer:
I know the missing values dissappear on day 19-20. This is why I'm making the plot.
Sorry about the lay-out of the plot. Not polished yet.
For the geom_point:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_point()
Picture: geom_point
For the geom_col:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_col()
Picture: geom_col
The problem is that you're using mutate and creating several rows for your groups. You cannot see that, but you will have plenty of points overlapping in your geom_point plot.
One way is to either use summarise, or you use distinct
Compare
library(tidyverse)
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_point()
The points look ugly because there is a lot of over plotting.
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
distinct(order, .keep_all = TRUE) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
Created on 2021-06-02 by the reprex package (v2.0.0)
So after some digging:
What happens was that the geom_col() function sums up all the missing values while geom_point() does not. Hence the large values for y. Why this is happening, I do not know. However doing the following worked fine for me:
gaussian_transformed$time <- as.factor(gaussian_transformed$time)
gaussian_transformed %>% group_by(time) %>% summarise(missing = sum(is.na(Rose_width))) -> gaussian_transformed
gaussian_transformed %>% ggplot(aes(x = time, y = missing)) + geom_col(fill = "blue", alpha = 0.5) + theme_minimal() + labs(title = "Missing values in Gaussian Outcome over the days", x = "Time (in days)", y = "Amount of missing values") + scale_y_continuous(breaks = seq(0, 10, 1))
With the plot: GaussianMissing

show gap for missing date in geom area

I like to plot the time series of my data. However there are some gaps in the date value like in the example below. The following code produces the plot disregarding the missing date. How can I show the missing date i.e. show a gap between 2021-01-02 and 2021-01-04 and similarly 2021-01-06 and 2021-01-08.
library(tidyverse)
fake.data <- data.frame(
varA = c(0.6,0.5,0.2,0.3,0.7),
varB = c(0.1,0.2,0.4,0.6,0.2),
varC = c(0.3,0.3,0.4,0.1,0.1),
start_date = as.Date(c('2021-01-01','2021-01-02','2021-01-04','2021-01-06','2021-01-08')),
stringsAsFactors = FALSE
)
fake.data %>%
gather(variable, value,varA:varC) %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area()
I guess the easiest would be to fake the gaps, e.g., with geom_rect.
Consider that "gaps in data" are actually inherent to most use of line / area graphs - some purists might actually be totally against showing lines / areas for non-continuous measurements, because it suggests continuous measurements. Thus, because it is interpolated anyways, you could argue that you might as well not need to show those gaps.
library(tidyverse)
fake.data <- data.frame(
varA = c(0.6,0.5,0.2,0.3,0.7),
varB = c(0.1,0.2,0.4,0.6,0.2),
varC = c(0.3,0.3,0.4,0.1,0.1),
start_date = as.Date(c('2021-01-01','2021-01-02','2021-01-04','2021-01-06','2021-01-08'))
) %>% pivot_longer(cols = matches("^var"), names_to = "variable", values_to = "value" )
ls_data <- setNames(fake.data %>%
complete(start_date = full_seq(start_date, 1)) %>%
split(., is.na(.$variable)), c("vals", "missing"))
ggplot(ls_data$vals, aes(x = start_date, y = value, fill = variable)) +
geom_area() +
geom_rect(data = ls_data$missing, aes(xmin = start_date-.5, xmax = start_date+.5,
ymin = 0, ymax = Inf), fill = "white") +
theme_classic()
Created on 2021-04-21 by the reprex package (v2.0.0)
Considering the above - I'd possibly favour not explicitly showing the gaps, but to show the measurements more explicitly. E.g., with geom_point.
fake.data %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area() +
geom_point(position = "stack") +
geom_line(position = "stack")
is this close to what you wish ?
todateseq<-fake.data %>%
select(start_date) %>%
pull
first <- min(todateseq)
last <- max(todateseq)
date_seq <- seq.Date(first,last,by='day')
fake.data2 <- data.frame(start_date=date_seq) %>%
left_join(fake.data,by='start_date')
fake.data2 %>%
gather(variable, value,varA:varC) %>%
mutate(value=ifelse(is.na(value),0,value)) %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area(na.rm = F,position = position_stack())

how to control the color of ggrepel segments

Going to try this again with a better MRE...for context, here's the product I'm currently trying to improve
What I'm trying to do is get the lines from the endpoints to the labels to be the same color as the data lines.
For purposes of this question we can work with this script
library(ggplot2)
library(babynames)
library(dplyr)
library(ggrepel)
library(ggsci)
data <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Mary", "Minnie")) %>%
filter(sex=="F")
data <- data %>% group_by(name) %>%
mutate(change = n - lag(n)) %>%
mutate(meanC = mean(change, na.rm = TRUE)) %>%
ungroup()
data$label <- paste(data$name,"\n",round(data$meanC,0),sep="" )
minYear = min(data$year)
maxYear = max(data$year)
#endpoint layer
Endpoints <- data %>%
group_by(name) %>%
filter(year == max(year)) %>%
select(year, name, n, label) %>%
ungroup()
namePlot <- data %>%
ggplot(mapping = aes(x=year, y=n)) +
geom_line(aes(color=name), show.legend = FALSE) +
coord_cartesian(xlim = c(minYear, maxYear+10)) +
scale_color_ucscgb() +
geom_point(data = Endpoints, size=1.5, shape=21,
aes(color=name, fill=name), show.legend=FALSE) +
geom_label_repel(data=Endpoints, aes(label=label),
color = c("forestgreen","red")[1+grepl("\\-\\d",Endpoints$label)],
show.legend = FALSE,
vjust = 0, xlim=c(maxYear+3,maxYear+10), size=3, direction='y')
print(namePlot)
which produces this plot
The colors of the labels is controlled by color = c("forestgreen","red")[1+grepl("\\-\\d",Endpoints$label)], so that, in this case, data with a positive value in the label is green and data with a negative value is red. What I'd like to is make the connecting lines from the endpoints to the label boxes be the same color as the data lines, which are controlled by geom_line(aes(color=name),show.legend = FALSE
In the ggrepel docs there is a segment.color parameter that can control the color of the line segment, but it is not an aesthetic. So it appears it has to be "hard-coded" like segment.color="red" which doesn't really help me. I also found this discussion about the issue that seemed to present a solution, but I have been unable to get it to work. Part of the issue there is that it involves scale_color_discrete(aesthetics = c("color", "segment.color")) and I already have scale_color_ucscgb() so I get a warning about replacing scales...
Any guidance would be most appreciated.
Working version based on guidance from #aosmith
library(ggplot2)
library(babynames)
library(dplyr)
library(ggrepel)
library(ggsci)
data <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Mary", "Minnie")) %>%
filter(sex=="F")
data <- data %>% group_by(name) %>%
mutate(change = n - lag(n)) %>%
mutate(meanC = mean(change, na.rm = TRUE)) %>%
ungroup()
data$label <- paste(data$name,"\n",round(data$meanC,0),sep="" )
minYear = min(data$year)
maxYear = max(data$year)
#endpoint layer
Endpoints <- data %>%
group_by(name) %>%
filter(year == max(year)) %>%
select(year, name, n, label) %>%
ungroup()
namePlot <- data %>%
ggplot(mapping = aes(x=year, y=n)) +
geom_line(aes(color=name), show.legend = FALSE) +
coord_cartesian(xlim = c(minYear, maxYear+15)) +
geom_point(data = Endpoints, size=1.5, shape=21,
aes(color=name, fill=name), show.legend=FALSE) +
geom_label_repel(data=Endpoints, aes(label=label,
segment.color=name),
color = c("forestgreen","red")[1+grepl("\\-\\d",Endpoints$label)],
show.legend = FALSE,
force = 50,
vjust = 0, xlim=c(maxYear+5,maxYear+12), size=3, direction='y') +
scale_color_discrete(aesthetics = c("color", "segment.color"))
print(namePlot)
produces

Add ylim to geom_col

I would like to see the y-axis (in the plot is flipped) starting at some arbitrary value, like 7.5
After a little bit of researching, I came across ylim, but in this case is giving me some
errors:
Scale for 'y' is already present. Adding another scale for 'y', which will
replace the existing scale.
Warning message:
Removed 10 rows containing missing values (geom_col).
This is my code, and a way to download the data I'm using:
install.packages("remotes")
remotes::install_github("tweed1e/werfriends")
library(werfriends)
friends_raw <- werfriends::friends_episodes
library(tidytext)
library(tidyverse)
#"best" writers with at least 10 episodes
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
ylim(7.5,10)
You should use coord_cartesian for zoom in a particular location (here the official documentation: https://ggplot2.tidyverse.org/reference/coord_cartesian.html).
With your example, your code should be something like that:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
coord_cartesian(ylim = c(7.5,10))
If this is not working please provide a reproducible example of your dataset (see: How to make a great R reproducible example)
I found out the solution. With my actual plot, the answer submitted by #dc37 didn't work because coord_flip() and coord_cartesian() exclude each other. So the way to do this is:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(mean_rating) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
theme(legend.position = "None") +
coord_flip(ylim = c(8,8.8))

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

Resources