GGplot order legend using last values on x-axis - r

I have some time series data plotted using ggplot. I'd like the legend, which appears to the right of the plot, to be in the same order as the line on the most recent date/value on the plot's x-axis. I tried using the case_when function, but I'm obviously using it wrong. Here is an example.
df <- tibble(
x = runif(100),
y = runif(100),
z = runif(100),
year = sample(seq(1900, 2010, 10), 100, T)
) %>%
gather(variable, value,-year) %>%
group_by(year, variable) %>%
summarise(mean = mean(value))
df %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
## does not work
df %>%
mutate(variable = fct_reorder(variable, case_when(mean ~ year == 2010)))
ggplot(aes(year, mean, color = variable)) +
geom_line()

We may add one extra line
ungroup() %>% mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE))
before plotting, or use
df %>%
mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE)) %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
In this way we look at the last values of mean and reorder variable accordingly.

There's another way without adding a new column using fct_reorder2():
library(tidyverse)
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean))) +
geom_line() +
labs(color = "variable")
Although it's not recommendable in your case, to order the legend based on the first (earliest) values in your plot you can set
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean, .fun = first2))) +
geom_line() +
labs(color = "variable")
The default is .fun = last2 (see also https://forcats.tidyverse.org/reference/fct_reorder.html)

Related

x-axis starting value for diverging plot

How can I change the "x-axis starting value" from the diverging bar chart below (extracted from here), so that the vertical axis is set at 25 instead of 0. And therefore the bars are drawn from 25 and not 0.
For instance, I want this chart:
To look like this:
EDIT
It it not the label I want to change, it is how the data is plotted. My apologies if I wasn't clear. See example below:
Another example to make it clear:
You can provide computed labels to an (x-)scale via scale_x_continuous(labels = function (x) x + 25).
If you also want to change the data, you’ll first need to offset the x-values by the equivalent amount (in the opposite direction):
Example:
df = tibble(Color = c('red', 'green', 'blue'), Divergence = c(5, 10, -5))
offset = 2
df %>%
mutate(Divergence = Divergence - offset) %>%
ggplot() +
aes(x = Divergence, y = Color) +
geom_col() +
scale_x_continuous(labels = function (x) x + offset)
I'm still not 100% clear on your intended outcome but you can "shift" your data by adding/subtracting 25 from each value, e.g.
Original plot:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
subtract 25:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
If you combine that with my original relabelling I think that's the solution:
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change - 25)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(breaks = c(-25, 0, 25, 50),
labels = c(0, 25, 50, 75))
The answers that existed at the time that I'm writing this are suggesting to change the data or to change the label. Here, I'm proposing to change neither the data nor the labels, and instead just change where the starting position of a bar is.
First, for reproducibility, I took #jared_mamrot's approach for the data subset.
library(gapminder)
library(tidyverse)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
Then, you can set xmin = after_scale(25). You'll get a warning that xmin doesn't exists, but it does exist after the bars are reparameterised to rectangles in the ggplot2 internals (which is after the x-scale has seen the data to determine limits). This effectively changes the position where bars start.
ggplot(gapminder_subset,
aes(gdp_change, country)) +
geom_col(aes(xmin = after_scale(25)))
#> Warning: Ignoring unknown aesthetics: xmin
Created on 2021-06-28 by the reprex package (v1.0.0)

show gap for missing date in geom area

I like to plot the time series of my data. However there are some gaps in the date value like in the example below. The following code produces the plot disregarding the missing date. How can I show the missing date i.e. show a gap between 2021-01-02 and 2021-01-04 and similarly 2021-01-06 and 2021-01-08.
library(tidyverse)
fake.data <- data.frame(
varA = c(0.6,0.5,0.2,0.3,0.7),
varB = c(0.1,0.2,0.4,0.6,0.2),
varC = c(0.3,0.3,0.4,0.1,0.1),
start_date = as.Date(c('2021-01-01','2021-01-02','2021-01-04','2021-01-06','2021-01-08')),
stringsAsFactors = FALSE
)
fake.data %>%
gather(variable, value,varA:varC) %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area()
I guess the easiest would be to fake the gaps, e.g., with geom_rect.
Consider that "gaps in data" are actually inherent to most use of line / area graphs - some purists might actually be totally against showing lines / areas for non-continuous measurements, because it suggests continuous measurements. Thus, because it is interpolated anyways, you could argue that you might as well not need to show those gaps.
library(tidyverse)
fake.data <- data.frame(
varA = c(0.6,0.5,0.2,0.3,0.7),
varB = c(0.1,0.2,0.4,0.6,0.2),
varC = c(0.3,0.3,0.4,0.1,0.1),
start_date = as.Date(c('2021-01-01','2021-01-02','2021-01-04','2021-01-06','2021-01-08'))
) %>% pivot_longer(cols = matches("^var"), names_to = "variable", values_to = "value" )
ls_data <- setNames(fake.data %>%
complete(start_date = full_seq(start_date, 1)) %>%
split(., is.na(.$variable)), c("vals", "missing"))
ggplot(ls_data$vals, aes(x = start_date, y = value, fill = variable)) +
geom_area() +
geom_rect(data = ls_data$missing, aes(xmin = start_date-.5, xmax = start_date+.5,
ymin = 0, ymax = Inf), fill = "white") +
theme_classic()
Created on 2021-04-21 by the reprex package (v2.0.0)
Considering the above - I'd possibly favour not explicitly showing the gaps, but to show the measurements more explicitly. E.g., with geom_point.
fake.data %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area() +
geom_point(position = "stack") +
geom_line(position = "stack")
is this close to what you wish ?
todateseq<-fake.data %>%
select(start_date) %>%
pull
first <- min(todateseq)
last <- max(todateseq)
date_seq <- seq.Date(first,last,by='day')
fake.data2 <- data.frame(start_date=date_seq) %>%
left_join(fake.data,by='start_date')
fake.data2 %>%
gather(variable, value,varA:varC) %>%
mutate(value=ifelse(is.na(value),0,value)) %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area(na.rm = F,position = position_stack())

Add 2 additional lines to a ggplot

I have successfully plotted my dat below. However, I want to do the following data transformation: dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm) and then ADD 2 lines to my current plot:
geom_smooth() using x.cm as x
geom_smooth() using x.cwc as x
Is there a way to do this?
p.s: Is it also possible to display the 3 unique(x.cm) values as 3 stars on the plot? (see pic below)
library(tidyverse)
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/cw.csv')
dat %>% group_by(groups) %>% ggplot() +
aes(x, y, color = groups, shape = groups)+
geom_point(size = 2) + theme_classic()+
stat_ellipse()
# Now do the transformation:
dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm)
I'm not sure the second part of your question makes any sense to me. But from the description one way is to simply add a level where you alter you data and aes argument, as I do in the example below (using mtcars as an example data)
# Load libraries and data
library(tidyverse)
library(ggplot2)
data(mtcars)
mtcars %>% ggplot(aes(x = hp, y = mpg)) +
geom_point(aes(col = factor(cyl))) +
stat_ellipse(aes(col = factor(cyl))) +
# Add line for ellipsis center
geom_line(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
geom_point(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
# Add smooth for.. what? I don't understand this part of the question.
geom_smooth(data = mtcars %>% group_by(cyl) %>% mutate(x_val = hp - mean(hp)) %>% ungroup(),
mapping = aes(x = x_val, y = mpg))
Now it should be quite clear which part does not make sense to me. Why/what do you mean with the second path (geom_smooth)? Moving the x-axis on the smoother makes no sense to me. Also I took the liberty of changing the definition of the first part, by instead adding the single points of the mean (center of circles) to the plot and connecting the using geom_line.

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

Ordering in geom_dotplot with 2 variables

I am using dotplot to do an analysis with two y variables per x variable. I'd like to arrange the chart so that it descends by one of the y variables. I used the reorder() function in the aes() and it reorders it slightly, but not entirely. Chart 1 is what it looks like before, and chart 2 is what it looks like after I use reorder().
Chart 1:
Chart 2:
Here's the code:
answers %>%
ggplot(aes(x = reorder(locale, -percent) , y = percent, fill = box)) +
geom_dotplot(binaxis='y',
stackdir='center',
dotsize = 1,
binwidth = 0.01) +
geom_errorbar(aes(ymin = ci_lo, ymax = ci_hi), width = .5, position = position_dodge(0))
And this is what the "answers" df looks like. The two variables being plotted per locale are in the "box" column - there's a top_box and bottom_box row for each locale:
As pointed out in the comments, you do not provide and data, but I think I have an idea on where you're going wrong.
Here is some example data. I'm going to use a modified mtcars for the example where we will look at the min and max weight of the cars by make.
library(tidyverse)
df <- mtcars %>% rownames_to_column() %>%
select(car = rowname, wt) %>%
mutate(car = gsub("\\s.*?$", "", car)) %>%
group_by(car) %>%
mutate(n = n()) %>%
filter(n > 1) %>%
arrange(car,wt) %>%
filter(row_number() == max(row_number()) | row_number() == min(row_number())) %>%
select(-n) %>%
ungroup() %>%
mutate(stat = rep(c("min", "max"), nrow(.)/2)) %>%
spread(stat, wt)
print(df)
# car max min
# Fiat 2.2 1.94
# Hornet 3.44 3.22
# Mazda 2.88 2.62
# Merc 4.07 3.15
# Toyota 2.46 1.84
Here is what the plot for that would look like:
df %>%
ggplot(aes(x = car))+
geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
Now lets talk about what you're trying to do. You say that you would like to order by descending on one of your variables.
df %>%
arrange(-max)%>%
mutate(car = factor(car, levels = car))%>%
ggplot(aes(x = car))+
geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
or
df %>%
arrange(-min)%>%
mutate(car = factor(car, levels = car))%>%
ggplot(aes(x = car))+geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
I think the key here is that you want to arrange the data and then set the factor levels to get the desired output. If your data is not a factor, then ggplot will use alphabetical order. You may need to spread your data in order to use the exact method outlined above.
Update
You could do this without spreading your data, by arranging with two variables.
Here we will modify the data above to long format
df2 <- df %>% gather(measure, value, -car)
Which plots like this
df2 %>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()
and then we can arrange without spreading
df2 %>%
arrange(-value, measure) %>%
mutate(car = factor(car, levels = unique(car)))%>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()
or for descending by min
df2 %>%
arrange(desc(measure), -value) %>%
mutate(car = factor(car, levels = unique(car)))%>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()

Resources