How to Arrange Stacked geom_bar by Ascending Proportion - r

I'm am looking at an R Tidy Tuesday dataset (European Energy) . I have wrangled the Imports and Exports as proportions and am looking to arrange the ggplot with an ascend on the Imports values. Just looking to make it look tidy, but can't seem to control the order to see each subsequent country with the next biggest import value.
I have left a couple of attempts in the code but commented out. Thnx in advance.
library(tidyverse)
country_totals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-04/country_totals.csv')
country_totals %>%
filter(!is.na(country_name)) %>%
filter(type %in% c("Imports","Exports")) %>%
group_by(country_name) %>%
mutate(country_type_ttl = sum(`2018`)) %>%
mutate(country_type_pct = `2018`/country_type_ttl) %>%
ungroup() %>%
mutate(type_hold = type) %>%
pivot_wider(names_from = type_hold, values_from = `2018`) %>%
# ggplot(aes(country_name, country_type_pct, fill = type)) +
# ggplot(aes(reorder(country_name, Imports), country_type_pct, fill = type)) +
ggplot(aes(fct_reorder(country_name, Imports), country_type_pct, fill = type)) +
geom_bar(stat = "identity") +
coord_flip()

This could be achieved by adding a column with the value by which you want to reorder, i.e. the percentage share of imports in 2018 using e.g. imports_2018 = country_type_pct[type == "Imports"]. Then reorder the counters according to this column:
`
library(tidyverse)
country_totals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-04/country_totals.csv')
country_totals %>%
filter(!is.na(country_name)) %>%
filter(type %in% c("Imports","Exports")) %>%
group_by(country_name) %>%
mutate(country_type_ttl = sum(`2018`)) %>%
mutate(country_type_pct = `2018`/country_type_ttl,
imports_2018 = country_type_pct[type == "Imports"]) %>%
ungroup() %>%
mutate(type_hold = type) %>%
ggplot(aes(fct_reorder(country_name, imports_2018), country_type_pct, fill = type)) +
geom_bar(stat = "identity") +
coord_flip()
#> Warning: Removed 2 rows containing missing values (position_stack).

Related

Issue with filter inside of geom in ggplot. "comparison (1) is possible only for atomic and list types"

I have a simple two-column time-series dataset that looks like this:
Date Signups
22-Feb-18 601
23-Feb-18 500
24-Feb-18 6000
...
27-Apr-22 999
28-Apr-22 998
29-Apr-22 123
30-Apr-22 321
And I'm trying to make a simple line chart that shows the monthly total over time and then a point at the most recent month. But the filter within the geom_point is giving me a hard time. Here's what I have:
library(tidyverse)
library(scales)
library(lubridate)
signups %>%
mutate(Date = dmy(Date)) %>%
group_by(month(Date), year(Date)) %>%
mutate(month = paste0(month(Date),"-",year(Date))) %>%
mutate(month = my(month)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup() %>%
dplyr::filter(month >= "2018-03-01") %>%
ggplot(aes(month, monthly_total)) +
geom_line() +
geom_point(data = signups %>% dplyr::filter(month == "2022-03-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)
If I comment out the geom_point it gives me the line chart that I'm looking for. But when the geom_point is included here it throws this error:
Error in dplyr::filter(., month == "2022-03-01") :
Caused by error in `month == "2022-03-01"`:
! comparison (1) is possible only for atomic and list types
I've tried using subset instead of filter and it didn't help. Let me know if you have any suggestions. Thanks!
The comment from Limey got us there. Here's what I needed to do:
signups <- signups %>%
mutate(Date = dmy(Date)) %>%
mutate(just_month = paste0(month(Date),"-",year(Date))) %>%
mutate(just_month = my(just_month)) %>%
group_by(month(Date), year(Date)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup()
signups %>%
dplyr::filter(just_month >= "2018-03-01") %>%
ggplot(aes(just_month, monthly_total)) +
geom_line(aes(just_month, monthly_total)) +
geom_point(data = dplyr::filter(signups, just_month == "2022-04-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)

How to display variable and value labels in ggplot bar chart?

I'm trying to get the variable labels and value labels to be displayed on a stacked bar chart.
library(tidyverse)
data <- haven::read_spss("http://staff.bath.ac.uk/pssiw/stats2/SAQ.sav")
data %>%
select(Q01:Q04) %>%
gather %>%
group_by(key, value) %>%
tally %>%
mutate(n = n/sum(n)*100, round = 1) %>%
mutate(n = round(n, 2)) %>%
ggplot(aes(x=key, y=n, fill=factor(value))) +
geom_col() +
geom_text(aes(label=as_factor(n)), position=position_stack(.5)) +
coord_flip() +
theme(aspect.ratio = 1/3) + scale_fill_brewer(palette = "Set2")
Instead of Q01, Q02, Q03, Q04, I would like to use the variable labels.
library(labelled)
var_label(data$Q01)
Statistics makes me cry
var_label(data$Q02)
My friends will think Im stupid for not being able to cope with SPSS
var_label(data$Q03)
Standard deviations excite me
var_label(data$Q04)
I dream that . . .
along with associated value labels
val_labels(data$Q01)
Strongly agree Agree Neither Disagree Strongly disagree Not answered
1 2 3 4 5 9
I tried using label = as_factor(n) but that didn't work.
We may extract the labels and then do a join
library(forcats)
library(haven)
library(dplyr)
library(tidyr)
library(labelled)
subdat <- data %>%
select(Q01:Q04)
d1 <- subdat %>%
summarise(across(everything(), var_label)) %>%
pivot_longer(everything())
subdat %>%
pivot_longer(everything(), values_to = 'val') %>%
left_join(d1, by = 'name') %>%
mutate(name = value, value = NULL) %>%
count(name, val) %>%
mutate(n = n/sum(n)*100, round = 1) %>%
mutate(n = round(n, 2)) %>%
ungroup %>%
mutate(labels = names(val_labels(val)[val])) %>%
ggplot(aes(x=name, y=n, fill=labels)) +
geom_col() +
geom_text(aes(label=as_factor(n)),
position=position_stack(.5)) +
coord_flip() +
theme(aspect.ratio = 1/3) +
scale_fill_brewer(palette = "Set2")
-output

ggplot bar chart limits fix

I am trying to fix the limits of a bar chart so the horizontal bar doesn't go over the plot area. I could set the limit manually using limits=c(0,3000000)but I guess there is a way to make it automatically scalable. The code
corona.conf <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header = TRUE,check.names=FALSE)
corona.conf %>% .[,c(-1,-3,-4)] %>% melt(.,variable.name="day") %>%
group_by(`Country/Region`,day) %>% summarize(value=sum(value)) %>%
mutate(day=as.Date(day,format='%m/%d/%y')) %>% mutate(count=value-lag(value)) %>%
replace(is.na(.),0) %>% group_by(`Country/Region`) %>% summarize(count=sum(count)) %>%
top_n(20) %>% arrange(desc(count)) %>% ggplot(.,aes(x=reorder(`Country/Region`,count),y=count,fill=count)) +
geom_bar(stat = "identity") + coord_flip() + geom_text(aes(label=format(count,big.mark = ",")),hjust=-0.1,size=4) +
scale_y_continuous(expand = c(0,0))
I thought something like:
scale_y_continuous(expand = c(0,0),limits=c(0,max(count))
Appreciate any suggestions on the fix.
I think it would be easier to read an run the code by splitting it into several parts.
We can use layer_data to get the information from a ggplot object, and the calculate the maximum from that. Based on your example, I would also suggest you multiply the maximum by 1.7 to include the text.
library(tidyverse)
library(data.table)
corona.conf <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header = TRUE,check.names=FALSE)
dat <- corona.conf %>% .[,c(-1,-3,-4)] %>% melt(.,variable.name="day") %>%
group_by(`Country/Region`,day) %>% summarize(value=sum(value)) %>%
mutate(day=as.Date(day,format='%m/%d/%y')) %>% mutate(count=value-lag(value)) %>%
replace(is.na(.),0) %>% group_by(`Country/Region`) %>% summarize(count=sum(count)) %>%
top_n(20) %>% arrange(desc(count))
p <- ggplot(dat, aes(x=reorder(`Country/Region`,count),y=count,fill=count)) +
geom_bar(stat = "identity") +
coord_flip() +
geom_text(aes(label=format(count,big.mark = ",")),hjust=-0.1,size=4)
p +
scale_y_continuous(expand = c(0,1), limits = c(0, max(layer_data(p)$y) * 1.7))

Working with tidyverse, ggplot, and broom to add confidence interval to a proportion test (prop.test) in R

Let's say I'm working with proportions, I have two main variables (sex and pain_level). It's not difficult to plot them:
With tidyverse and broom (and thanks for this link here: Calling prop.test function in R with dplyr) I can compare if the proportions are statistically different.
Now comes the question!
I want to add to the plot, the error bar. I know it's not as difficult as I'm thinking, but I could not find a way to do it. I've tried to replicate this link here (http://www.andrew.cmu.edu/user/achoulde/94842/labs/lab07_solution.html) but I'm trying to stay at tidyverse environment.
The desired output should be something like that:
Please feel free to use the script/syntax below that simulate the original dataset.
library(tidyverse)
ds <- data.frame(sex = rep(c("M","F"), 18),
pain_level = c("High","Moderate","low"))
#plot
ds %>%
group_by(pain_level, sex) %>%
summarise(n=n()) %>%
mutate(prop = n/sum(n)*100) %>%
ggplot(., aes(x = sex, fill = pain_level, y = prop)) +
geom_bar(stat = "summary") +
facet_wrap( ~ pain_level) +
theme(legend.position = "none")
#p values of proportion test
ds %>%
rowwise %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>% #compute totals
distinct(., pain_level, .keep_all= TRUE) %>% #keep only one value of the row
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst)
I think the following might roughly resemble your desired output:
ds %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>%
rowwise() %>%
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst) %>%
ggplot(aes(sex, estimate, group = pain_level)) +
geom_col(aes(fill = pain_level)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
facet_wrap(~ pain_level)

make geom_bar show values in the ascending order

Although my query shows me values in descending order, ggplot then displays them alphabetically instead of ascending order.
Known solutions to this problem haven't seem to work. They suggest using Reorder or factor for values, which didn't work in this case
This is my code:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y = movies_made, fill = studio, label = as.character(movies_made))) +
geom_bar(stat = 'identity') +
geom_label(label.size = 1, size = 5, color = "white") +
theme(legend.position = "none") +
ylab("Movies Made") +
xlab("Studio")
for those wanting a more complete example, here's where I got:
library(dplyr)
library(ggplot2)
# get some dummy data
boxoffice = boxoffice::boxoffice(dates=as.Date("2017-1-1"))
df <- (
boxoffice %>%
group_by(distributor) %>%
summarise(movies_made = n()) %>%
mutate(studio=reorder(distributor, -movies_made)) %>%
top_n(10))
ggplot(df, aes(x=distributor, y=movies_made)) + geom_col()
You'll need to convert boxoffice$studio to an ordered factor. ggplot will then respect the order of rows in the data set, rather than alphabetizing. Your dplyr chain will look like this:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
ungroup() %>% # ungroup
mutate(studio = factor(studio, studio, ordered = T)) %>% # convert variable
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y... (rest of plotting code)

Resources