How to display variable and value labels in ggplot bar chart? - r

I'm trying to get the variable labels and value labels to be displayed on a stacked bar chart.
library(tidyverse)
data <- haven::read_spss("http://staff.bath.ac.uk/pssiw/stats2/SAQ.sav")
data %>%
select(Q01:Q04) %>%
gather %>%
group_by(key, value) %>%
tally %>%
mutate(n = n/sum(n)*100, round = 1) %>%
mutate(n = round(n, 2)) %>%
ggplot(aes(x=key, y=n, fill=factor(value))) +
geom_col() +
geom_text(aes(label=as_factor(n)), position=position_stack(.5)) +
coord_flip() +
theme(aspect.ratio = 1/3) + scale_fill_brewer(palette = "Set2")
Instead of Q01, Q02, Q03, Q04, I would like to use the variable labels.
library(labelled)
var_label(data$Q01)
Statistics makes me cry
var_label(data$Q02)
My friends will think Im stupid for not being able to cope with SPSS
var_label(data$Q03)
Standard deviations excite me
var_label(data$Q04)
I dream that . . .
along with associated value labels
val_labels(data$Q01)
Strongly agree Agree Neither Disagree Strongly disagree Not answered
1 2 3 4 5 9
I tried using label = as_factor(n) but that didn't work.

We may extract the labels and then do a join
library(forcats)
library(haven)
library(dplyr)
library(tidyr)
library(labelled)
subdat <- data %>%
select(Q01:Q04)
d1 <- subdat %>%
summarise(across(everything(), var_label)) %>%
pivot_longer(everything())
subdat %>%
pivot_longer(everything(), values_to = 'val') %>%
left_join(d1, by = 'name') %>%
mutate(name = value, value = NULL) %>%
count(name, val) %>%
mutate(n = n/sum(n)*100, round = 1) %>%
mutate(n = round(n, 2)) %>%
ungroup %>%
mutate(labels = names(val_labels(val)[val])) %>%
ggplot(aes(x=name, y=n, fill=labels)) +
geom_col() +
geom_text(aes(label=as_factor(n)),
position=position_stack(.5)) +
coord_flip() +
theme(aspect.ratio = 1/3) +
scale_fill_brewer(palette = "Set2")
-output

Related

Issue with filter inside of geom in ggplot. "comparison (1) is possible only for atomic and list types"

I have a simple two-column time-series dataset that looks like this:
Date Signups
22-Feb-18 601
23-Feb-18 500
24-Feb-18 6000
...
27-Apr-22 999
28-Apr-22 998
29-Apr-22 123
30-Apr-22 321
And I'm trying to make a simple line chart that shows the monthly total over time and then a point at the most recent month. But the filter within the geom_point is giving me a hard time. Here's what I have:
library(tidyverse)
library(scales)
library(lubridate)
signups %>%
mutate(Date = dmy(Date)) %>%
group_by(month(Date), year(Date)) %>%
mutate(month = paste0(month(Date),"-",year(Date))) %>%
mutate(month = my(month)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup() %>%
dplyr::filter(month >= "2018-03-01") %>%
ggplot(aes(month, monthly_total)) +
geom_line() +
geom_point(data = signups %>% dplyr::filter(month == "2022-03-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)
If I comment out the geom_point it gives me the line chart that I'm looking for. But when the geom_point is included here it throws this error:
Error in dplyr::filter(., month == "2022-03-01") :
Caused by error in `month == "2022-03-01"`:
! comparison (1) is possible only for atomic and list types
I've tried using subset instead of filter and it didn't help. Let me know if you have any suggestions. Thanks!
The comment from Limey got us there. Here's what I needed to do:
signups <- signups %>%
mutate(Date = dmy(Date)) %>%
mutate(just_month = paste0(month(Date),"-",year(Date))) %>%
mutate(just_month = my(just_month)) %>%
group_by(month(Date), year(Date)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup()
signups %>%
dplyr::filter(just_month >= "2018-03-01") %>%
ggplot(aes(just_month, monthly_total)) +
geom_line(aes(just_month, monthly_total)) +
geom_point(data = dplyr::filter(signups, just_month == "2022-04-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)

R tidytext graph comes up vertical and not horizontal

I'm trying to create a sentiment analysis using the tidytext code here but my graph comes out vertical, without the output making sense compared to the original which is horizontal. How can I fix this?
#Unnest tokens
edAItext = edAI %>% select(Group, Participant_ID, Brainscape_Pattern) %>%
unnest_tokens(word, Brainscape_Pattern)
# Inner join
bing_word_counts <- edAItextTest %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
#Check
bing_word_counts
#Plot
bing_word_counts %>%
group_by(sentiment) %>%
slice_max(n, n = 5) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(n, word, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
labs(x = "Contribution to sentiment",
y = NULL)
This is how it looks:
This is how it's supposed to look:

ggplot with stacked bar chart ordered by a separate variable

I am trying to create a "order" stacked bar chart that each stack is colored by one variable and ordered by another variable, please find my example as below:
library(ggplot2)
library(dplyr)
data(iris)
chart.df.st00 <- iris %>%
as_tibble %>%
mutate(`Sepal.Length`=round(`Sepal.Length`)) %>%
count(Species,`Sepal.Length`) %>%
mutate(`Sepal.Length`=as.character(`Sepal.Length`)) %>%
group_by(Species) %>%
mutate(percent=n/sum(n)*100) %>%
arrange(desc(n)) %>%
mutate(rank=1:n()) %>%
ungroup %>%
mutate(rank=paste(Species,rank,sep='-'))
chart.df.st01 <- chart.df.st00 %>%
left_join(chart.df.st00 %>%
distinct(`Sepal.Length`) %>%
mutate(color=colorRampPalette(
RColorBrewer::brewer.pal(length(unique(chart.df.st00$`Sepal.Length`)),'Set1'))(length(unique(chart.df.st00$`Sepal.Length`)))))
chart.color1.st00 <- chart.df.st01 %>%
distinct(rank,color) %>%
arrange(rank)
chart.color1.st01 <- chart.color1.st00$color
names(chart.color1.st01) <- chart.color1.st00$rank
chart1 <- ggplot(data=chart.df.st01,
aes(x=1,y=percent)) +
geom_bar(aes(fill=rank),stat='identity') +
scale_fill_manual(values=chart.color1.st01) +
facet_wrap(.~Species,ncol = 1) +
scale_y_reverse(breaks=c(0,25,50,75,100),labels=c(100,75,50,25,0)) +
coord_flip()
chart.color2.st00 <- chart.df.st01 %>%
distinct(color,Sepal.Length) %>%
arrange(Sepal.Length)
chart.color2.st01 <- chart.color2.st00$color
names(chart.color2.st01) <- chart.color2.st00$`Sepal.Length`
chart2 <- ggplot(data=chart.df,
aes(x=1,y=percent)) +
geom_bar(aes(fill=`Sepal.Length`),stat='identity') +
scale_fill_manual(values=chart.color2.st01) +
facet_wrap(.~Species,ncol = 1) +
coord_flip()
In my example, each stack is filled by Sepal.Length, and order by rank, chart1 has the ordering of the stacks I want, but not the legend, while chart2 has the legend I want, but not the ordering of the stacks.
Is there a way to have a single chart with the stacked bar of chart1 and the legend of chart2?
Thanks!
Using the code for your second chart this could be achieved by additionally mapping rank on the group aes:
library(ggplot2)
library(dplyr)
data(iris)
chart.df.st00 <- iris %>%
as_tibble %>%
mutate(`Sepal.Length`=round(`Sepal.Length`)) %>%
count(Species,`Sepal.Length`) %>%
mutate(`Sepal.Length`=as.character(`Sepal.Length`)) %>%
group_by(Species) %>%
mutate(percent=n/sum(n)*100) %>%
arrange(desc(n)) %>%
mutate(rank=1:n()) %>%
ungroup %>%
mutate(rank=paste(Species,rank,sep='-'))
chart.df.st01 <- chart.df.st00 %>%
left_join(chart.df.st00 %>%
distinct(`Sepal.Length`) %>%
mutate(color=colorRampPalette(
RColorBrewer::brewer.pal(length(unique(chart.df.st00$`Sepal.Length`)),'Set1'))(length(unique(chart.df.st00$`Sepal.Length`)))))
#> Joining, by = "Sepal.Length"
chart.color2.st00 <- chart.df.st01 %>%
distinct(color,Sepal.Length) %>%
arrange(Sepal.Length)
chart.color2.st01 <- chart.color2.st00$color
names(chart.color2.st01) <- chart.color2.st00$`Sepal.Length`
ggplot(data=chart.df.st01,
aes(x=1,y=percent)) +
geom_bar(aes(fill=`Sepal.Length`, group = rank), stat='identity') +
scale_fill_manual(values = chart.color2.st01) +
facet_wrap(.~Species,ncol = 1) +
scale_y_reverse(breaks=c(0,25,50,75,100),labels=c(100,75,50,25,0)) +
coord_flip()

ggplot bar chart limits fix

I am trying to fix the limits of a bar chart so the horizontal bar doesn't go over the plot area. I could set the limit manually using limits=c(0,3000000)but I guess there is a way to make it automatically scalable. The code
corona.conf <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header = TRUE,check.names=FALSE)
corona.conf %>% .[,c(-1,-3,-4)] %>% melt(.,variable.name="day") %>%
group_by(`Country/Region`,day) %>% summarize(value=sum(value)) %>%
mutate(day=as.Date(day,format='%m/%d/%y')) %>% mutate(count=value-lag(value)) %>%
replace(is.na(.),0) %>% group_by(`Country/Region`) %>% summarize(count=sum(count)) %>%
top_n(20) %>% arrange(desc(count)) %>% ggplot(.,aes(x=reorder(`Country/Region`,count),y=count,fill=count)) +
geom_bar(stat = "identity") + coord_flip() + geom_text(aes(label=format(count,big.mark = ",")),hjust=-0.1,size=4) +
scale_y_continuous(expand = c(0,0))
I thought something like:
scale_y_continuous(expand = c(0,0),limits=c(0,max(count))
Appreciate any suggestions on the fix.
I think it would be easier to read an run the code by splitting it into several parts.
We can use layer_data to get the information from a ggplot object, and the calculate the maximum from that. Based on your example, I would also suggest you multiply the maximum by 1.7 to include the text.
library(tidyverse)
library(data.table)
corona.conf <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header = TRUE,check.names=FALSE)
dat <- corona.conf %>% .[,c(-1,-3,-4)] %>% melt(.,variable.name="day") %>%
group_by(`Country/Region`,day) %>% summarize(value=sum(value)) %>%
mutate(day=as.Date(day,format='%m/%d/%y')) %>% mutate(count=value-lag(value)) %>%
replace(is.na(.),0) %>% group_by(`Country/Region`) %>% summarize(count=sum(count)) %>%
top_n(20) %>% arrange(desc(count))
p <- ggplot(dat, aes(x=reorder(`Country/Region`,count),y=count,fill=count)) +
geom_bar(stat = "identity") +
coord_flip() +
geom_text(aes(label=format(count,big.mark = ",")),hjust=-0.1,size=4)
p +
scale_y_continuous(expand = c(0,1), limits = c(0, max(layer_data(p)$y) * 1.7))

Add ylim to geom_col

I would like to see the y-axis (in the plot is flipped) starting at some arbitrary value, like 7.5
After a little bit of researching, I came across ylim, but in this case is giving me some
errors:
Scale for 'y' is already present. Adding another scale for 'y', which will
replace the existing scale.
Warning message:
Removed 10 rows containing missing values (geom_col).
This is my code, and a way to download the data I'm using:
install.packages("remotes")
remotes::install_github("tweed1e/werfriends")
library(werfriends)
friends_raw <- werfriends::friends_episodes
library(tidytext)
library(tidyverse)
#"best" writers with at least 10 episodes
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
ylim(7.5,10)
You should use coord_cartesian for zoom in a particular location (here the official documentation: https://ggplot2.tidyverse.org/reference/coord_cartesian.html).
With your example, your code should be something like that:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
coord_cartesian(ylim = c(7.5,10))
If this is not working please provide a reproducible example of your dataset (see: How to make a great R reproducible example)
I found out the solution. With my actual plot, the answer submitted by #dc37 didn't work because coord_flip() and coord_cartesian() exclude each other. So the way to do this is:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(mean_rating) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
theme(legend.position = "None") +
coord_flip(ylim = c(8,8.8))

Resources