ggplot2 bar plot by two groups and mean of y variable - r

I'm trying to create a bar plot for which I have two groups and the y variable is the mean of one of those groups.
Sample Bar Graph
So looking at the above bar graph in the photo, I have bars grouped by country and prosocial, and on the y-axis I have taken the fraction of prosocial individuals. I am only able, however, to create a bar plot that only takes the mean of prosocial and groups it by country. Basically, it's just one bar per county. Which is not exactly what I'm looking for. So far this is the code I've been using to group the data for the bar plot, which has been somewhat unsuccessful.
plotData <- myData2[!is.na(myData2$prosocial),]
plotData <- plotData %>%
mutate(mean_prosocial = mean(prosocial)) %>%
group_by(country) %>%
summarise(mean_prosocial = mean(prosocial),se = sd(prosocial) / sqrt(n()))
This only groups by country and if I want to group by prosocial as well, I obviously just get NAs for the mean variable. Below is a link to the working data:
workable data.
Thanks.

Say you want to find the fraction of prosocial/non-prosocial across countries:
require(dplyr)
require(ggplot2)
First find how many observations in each country. Later it will be used in fraction calculation.
count_country <- myData2 %>%
filter(!is.na(prosocial)) %>%
group_by(country) %>%
summarise(n = length(country)) %>%
ungroup
Next find the number of prosocial/non-prosocial count across countries.
count_prosocial <- myData2 %>%
filter(!is.na(prosocial)) %>%
group_by(country, prosocial) %>%
summarise(n = length(prosocial)) %>%
mutate(prosocial = as.factor(prosocial))
Merge two dataframes by country name and find the fractions:
df <- count_prosocial %>%
left_join(count_country, by = "country") %>%
mutate(frac = round(n.x / n.y, 2))
Display fractions across different countries using facet_wrap:
ggplot(data=df, aes(x=prosocial, y=frac, fill=prosocial)) +
geom_bar(stat = "identity")+
geom_text(aes(x=prosocial, y=frac, label = frac),
position = position_dodge(width = 1),
vjust = 2, size = 3, color = "white", fontface = "bold")+
facet_wrap(~country)+
labs(y = "Fraction of prosocial/non-prosocial") +
scale_fill_discrete(labels=c("Prosocial", "Individualist"))+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())

Related

How to Create A Stacked Column Plot of Multiple Variables in R (ggplot2)

Currently, I have a data frame that looks like this:
Month Total Revenue Dues Total Retail Other Revenue
8/31/2020 36615.00 30825 1200 4590
9/30/2020 38096.69 34322 2779.4 995.29
10/31/2020 43594.15 35936 2074.68 5583.47
11/30/2020 51856.9 43432 993.5 7431.4
I want to create a plot (which I imagine should be a stacked column) in ggplot that shows the revenue mix by type for each month. For my data, Total Revenue is the sum of dues, total retail and other revenue. Dues, total retail and other revenue should stack on top of each other, each having its own colour. I also want labels on the column chart describing what percentage of the total revenue is from each source of income.
I can plot the total revenue with no issues, but I cannot seem to wrap my head around splitting the columns up. My only successful example so far is as follows.
# Create Column Plot of Total Revenue
library(tidyverse)
plot1 <- ggplot(August_Data, aes(Month_End, `Total Revenue`)) + geom_col()
This example obviously does not split up the revenue into the correct subcategories. I thought that using the fill command may work however I face an error.
plot1 <- ggplot(August_Data, aes(Month_End, `Total Revenue`)) + geom_col(aes(fill = C(Dues, `Total Retail`, `Other Revenue`)))
Thank you so much for your help
Update after clarification:
library(tidyverse)
library(lubridate)
df %>%
mutate(Month = mdy(Month)) %>% # this line is not necessary in OPs original code (not the one presented here)
pivot_longer(
cols = c("Dues", "TotalRetail", "OtherRevenue"),
# cols = -c(Month_End, SID) in OPs original code
names_to = "names",
values_to = "values"
) %>%
mutate(percent = values/TotalRevenue*100) %>%
ggplot(aes(x = Month, y= values, fill= names))+
geom_col() +
geom_text(aes(label = paste0(round(percent,1),"%")),
position = position_stack(vjust = 0.5), size = 5)
First answer:
You were almost there. Pivot longer and add fill.
library(tidyverse)
library(lubridate)
df %>%
mutate(Month = mdy(Month)) %>%
pivot_longer(
-Month,
names_to = "names",
values_to = "values"
) %>%
ggplot(aes(x = Month, y= values, fill= names))+
geom_col()

create plot in ggplot for each unique value in a row in r

I have a dataframe like this:
library(tidyverse)
my_data <- tibble(name = c("Justin", "Janet", "Marisa"),
x = c(100, 50, 75),
y = c(2, 3, 6))
Each name is unique, and I want to make a bar graph for each person without having to do it line by line. I also want to save each plot as a unique object because I'll be inputting it into a power point using the officer package. Last, the names won't always be the same, but each name will always be unique.
For instance, I want one plot for Janet, one plot for Justin, and one plot for Marisa. I don't want them faceted but instead as their own objects.
Any thoughts?
We can get the data in long format first and for each individual name create the plot.
library(tidyverse)
long_data <- my_data %>% tidyr::pivot_longer(cols = -name, names_to = 'col')
plots_list <- map(unique(my_data$name), ~long_data %>%
filter(name == .x) %>%
ggplot() + aes(name, value, fill = col) +
geom_bar(stat = 'identity', position = 'dodge') +
scale_fill_manual(values = c('red', 'blue')) +
ggtitle(paste0('Plot for ', .x)))
This will return list of plots where individual plots can be accessed via plots_list[[1]], plots_list[[2]] etc.
plots_list[[1]]

Overlay a frequency polygon over a bar plot with non count stat

I would like to overlay a frequency polygon over a bar plot where stat = 'identity' and not count. This answer works with count data but not when you are using summarised data. Take this example below:
Data
data <- tibble(my_factors = c(1,1,1,2,2,2,3,3,3,4,4,4),
total = c(10,20,30,40,50,60,70,80,90,100,110,120))
Group by factors and plot total
data %>%
group_by(my_factors) %>%
summarise(total = sum(total)) %>%
ungroup() %>%
ggplot(aes(my_factors, total)) +
geom_bar(stat = 'identity')
Desired output
In this case it's a fairly linear line but would a 'smooth' line also be possible?

ggalluvial: How do I plot an alluvial diagram when I have a dataframe with links and nodes?

I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:
library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)
plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}
x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)
df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))
I can plot the left side and plot the right side of the diagram I want:
plotAlluvial(df2.left)
plotAlluvial(df2.right)
But if I try to plot the left and right side at the same time I get this plot:
plotAlluvial(df2)
When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?
EDIT:
After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.
df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)

ggplot2() bar chart and dplyr() grouped and overall data in R

I'd like to make a stacked proportional bar chart representing the prevalence of diabetes in a cohort of individuals residing in towns A, B, and C. I'd also like the plot to feature a bar representing the entire cohort.
I'm happy with the below plot, but I'd like to know if there is a way of incorporating the pre-processing step into the processing step, ie piping it with dplyr()?
Thanks!
Starting point (df):
dfa <- data.frame(town=c("A","A","A","B","B","C","C","C","C","C"),diabetes=c("y","y","n","n","y","n","y","n","n","y"),heartdisease=c("n","y","y","n","y","y","n","n","n","y"))
Pre-processing:
dfb <- rbind(dfa, transform(dfa, town = "ALL"))
Processing and plot:
library(dplyr)
library(ggplot)
dfc <- dfb %>%
group_by(town) %>%
count(diabetes) %>%
mutate(prop = n / sum(n))
ggplot(dfc, aes(x = town, y = prop, fill = diabetes)) +
geom_bar(stat = "identity") +
coord_flip()
Like this:
dfc <- dfa %>%
bind_rows(dfa %>%
mutate(town = "ALL")) %>%
group_by(town) %>%
count(diabetes) %>%
mutate(prop = n / sum(n)) %>%
ggplot(aes(x = town, y = prop, fill = diabetes)) +
geom_bar(stat = "identity") +
coord_flip()
EDIT: added pre-processing into pipeline using bind_rows and mutate instead of rbind and transform

Resources