Remove facets from ggplot and instead display on a single chart? - r

Here is a plot that is very similar to one that I made for a stakeholder:
diamonds %>%
group_by(cut, color) %>%
summarise(av_price = mean(price)) %>%
ggplot(aes(color)) +
geom_bar(aes(weight = av_price)) +
facet_wrap(cut ~ .)
Looks like:
I've been asked to remove the facets and instead display each cut on the same chart but with some space between each (and perhaps each with their own color for readability?)
I do not know how to get this done. Tried:
diamonds %>%
group_by(cut, color) %>%
summarise(av_price = mean(price)) %>%
ggplot(aes(color, cut)) +
geom_bar(aes(weight = av_price))
Error: stat_count() can only have an x or y aesthetic.
How can I display each cut on a single chart as opposed to facets?

How about this solution:
diamonds %>%
group_by(cut, color) %>%
summarise(av_price = mean(price)) %>%
ggplot(aes(color, av_price, fill=cut)) +
geom_col(position="dodge") +
facet_wrap(~cut, nrow=1) +
theme(strip.text.x = element_blank())

Related

shading under geom_step with discrete x-axis, respecting the factor order

I'd like to shade the area under a geom_step() curve on a plot with a discrete and ordered x-axis, e.g. to show the cumulative distribution for some frequency-ordered categories/
The basic geom_step() curve could be created like this:
library(dplyr)
library(ggplot2)
library(forcats)
diamonds %>%
group_by(color) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf=cumsum(frac_of_tot),
color=fct_reorder(color, ecdf)) %>%
ggplot(aes(x=color, y=ecdf, group=0)) +
geom_step() +
expand_limits(y=0) +
labs(title="a pareto-style cumulative distribution chart",
subtitle="with x-axis ordered by decreasing frequency",
y="cumulative fraction of total") +
theme_minimal()
but adding the shaded area using geom_rect() as taught by this answer seems to re-order the x-axis, resulting in a nonsensical plot:
diamonds %>%
group_by(color) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf=cumsum(frac_of_tot),
color=fct_reorder(color, ecdf)) %>%
ggplot(aes(x=color, y=ecdf, group=0)) +
geom_step() +
geom_rect(aes(xmin=color, xmax=lead(color), ymin=0, ymax=ecdf), alpha=0.3) +
expand_limits(y=0) +
labs(title="A sudden mess after adding geom_rect",
subtitle="with x-axis surprisingly back in alpha order",
y="cumulative fraction of total") +
theme_minimal()
Why is the geom_rect() layer causing the x-axis to be re-ordered?
How can I produce a plot that looks just like the first one, but with the area under the curve shaded?
It seems to me that doing this with geom_rect is doing it the hard way. With some minor data reshaping you can simply use geom_area
library(dplyr)
library(ggplot2)
library(forcats)
library(tidyr)
diamonds %>%
group_by(color) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf = cumsum(frac_of_tot),
ecd = lag(ecdf),
color = fct_reorder(color, ecdf)) %>%
pivot_longer(starts_with("ecd")) %>%
arrange(color, name) %>%
ggplot(aes(x = color, y = value, group = 0)) +
geom_area(position = "identity", color = "black", alpha = 0.5) +
expand_limits(y = 0) +
labs(title = "a pareto-style cumulative distribution chart",
subtitle = "with x-axis ordered by decreasing frequency",
y = "cumulative fraction of total") +
theme_minimal()

grouped dataframe - groupings as facets in a ggplot?

Some data
grp_diamonds <- diamonds %>%
group_by(cut) %>%
group_split
grp_diamonds[[1]] %>%
ggplot(aes(x = carat, y = price)) +
geom_point()
This returns a plot for grp_diamonds[[1]]
But grp_diamonds is actually a list of 5 dataframes since I used group_split() earlier.
Is there a clever way to automatically use the groups as facets?
Yes, in this example you could just do this:
diamonds %>%
ggplot(aes(x = carat, y = price)) +
geom_point() +
facet_wrap(vars(cut))
But I wondered if there was a way to automatically facet based on existing groupings?
Making use of dplyr::groups and dplyr::vars and !!!one option would be:
library(dplyr)
library(ggplot2)
grp_diamonds <- diamonds %>%
group_by(cut, color)
grp_diamonds %>%
ggplot(aes(x = carat, y = price)) +
geom_point() +
facet_wrap(facets = vars(!!!groups(grp_diamonds)))

How to Combine 2 Line Graphs Together

I'm new to using R so please bear with me as my code might not look the best. So I want to combine these two line graphs together since right now I have written code for each item that I am analyzing. This is the dataset I am using: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-01/readme.md I used the "Arable_Land" dataset!
##USA Arable Land
plot_arable_land_USA <- arable_land %>%
filter(Code == "USA") %>%
select(c(Year, Code, `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`)) %>%
pivot_longer(-c(Year, Code)) %>%
ggplot(aes(x = Year, y = value,color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')
ggplotly(plot_arable_land_USA)
##Canada Arable Land
plot_arable_land_CAN <- arable_land %>%
filter(Code == "CAN") %>%
select(c(Year, Code, `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`)) %>%
pivot_longer(-c(Year, Code)) %>%
ggplot(aes(x = Year, y = value,color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')
ggplotly(plot_arable_land_CAN)
Ideally, I would like one graph to show both like one line (in Purple) to show the USA and another line(in Brown) to show Canada.
Thank you!
Try this. It is a better practice to reshape data to long as you did. In your case you can add filter() to choose the desired countries. Then, reshape to long and design the plot. The key is setting color and group with Code in order to obtain the desired lines. You can set the colors using scale_color_manual() and I have left the facet option to get the title. Here the code:
library(plotly)
library(tidyverse)
#Code
plot_arable_land_CAN <- arable_land %>% select(-Entity) %>%
filter(Code %in% c('USA','CAN')) %>%
pivot_longer(-c(Code,Year)) %>%
ggplot(aes(x = Year, y = value,color=Code,group=Code)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')+
scale_color_manual(values = c('brown','purple'))
#Transform
ggplotly(plot_arable_land_CAN)
Output:

ggplot bar chart limits fix

I am trying to fix the limits of a bar chart so the horizontal bar doesn't go over the plot area. I could set the limit manually using limits=c(0,3000000)but I guess there is a way to make it automatically scalable. The code
corona.conf <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header = TRUE,check.names=FALSE)
corona.conf %>% .[,c(-1,-3,-4)] %>% melt(.,variable.name="day") %>%
group_by(`Country/Region`,day) %>% summarize(value=sum(value)) %>%
mutate(day=as.Date(day,format='%m/%d/%y')) %>% mutate(count=value-lag(value)) %>%
replace(is.na(.),0) %>% group_by(`Country/Region`) %>% summarize(count=sum(count)) %>%
top_n(20) %>% arrange(desc(count)) %>% ggplot(.,aes(x=reorder(`Country/Region`,count),y=count,fill=count)) +
geom_bar(stat = "identity") + coord_flip() + geom_text(aes(label=format(count,big.mark = ",")),hjust=-0.1,size=4) +
scale_y_continuous(expand = c(0,0))
I thought something like:
scale_y_continuous(expand = c(0,0),limits=c(0,max(count))
Appreciate any suggestions on the fix.
I think it would be easier to read an run the code by splitting it into several parts.
We can use layer_data to get the information from a ggplot object, and the calculate the maximum from that. Based on your example, I would also suggest you multiply the maximum by 1.7 to include the text.
library(tidyverse)
library(data.table)
corona.conf <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header = TRUE,check.names=FALSE)
dat <- corona.conf %>% .[,c(-1,-3,-4)] %>% melt(.,variable.name="day") %>%
group_by(`Country/Region`,day) %>% summarize(value=sum(value)) %>%
mutate(day=as.Date(day,format='%m/%d/%y')) %>% mutate(count=value-lag(value)) %>%
replace(is.na(.),0) %>% group_by(`Country/Region`) %>% summarize(count=sum(count)) %>%
top_n(20) %>% arrange(desc(count))
p <- ggplot(dat, aes(x=reorder(`Country/Region`,count),y=count,fill=count)) +
geom_bar(stat = "identity") +
coord_flip() +
geom_text(aes(label=format(count,big.mark = ",")),hjust=-0.1,size=4)
p +
scale_y_continuous(expand = c(0,1), limits = c(0, max(layer_data(p)$y) * 1.7))

ggplot prioritize line overlap

library(tidyverse)
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line() +
theme_classic()
In the image below the red 'violation' line is broken up into segments. I assume this is because ggplot is plotting lines sequentially, and one of the grey lines, is plotted after the red line, with the same coordinates. How do I stop the grey lines from overlapping the red?
As referenced in other stackoverflow questions, I would add a seperate line like this:
geom_line(df %>% filter(violation == TRUE), aes(color = "red")) +
but this causes problems when there are no violations in my data frames. I do monthly analyses and some months contain violations, some months do not. If I add this single line above I get an error "must be length greater than 0" for the months absent violations, so this one-liner approach probably won't work.
You might use the following code (with only 2 minor changes as compared to your code)
library(tidyverse)
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line(alpha = .5, size= 1.2) + ### changes in transparancy and thickness ###
theme_classic()
Yielding this plot:
"H 1"'s suggestion is an alternative approach which changes the sequence of line drawings:
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line(aes(group = rev(key))) + ### changes sequence of plotting of the lines ###
theme_classic()
This produces the following plot:

Resources