Plot based on descending value of a variable - r

I want to create a plot that shows the relationship between countries (categorical), their government type (4 categories, including NA), and the proportion of covid deaths to population. I want to show the 30 countries with the highest death proportion and if there is a relationship with the government type.
Right now the countries are plotted in alphabetical order, but I would like to plot the death proportion in descending order. I can't seem to figure out how to do this. Thanks!
library(tidyverse)
library(lubridate)
library(readr)
Governmental System, Country, Proportion of Deaths to Population
covid_data <- read_csv(here::here("data/covid_data.csv"))
covid_data <- covid_data %>%
mutate(death_proportion = total_deaths / population)
covid_data[with(covid_data, order(-death_proportion)), ] %>%
head(30) %>%
ggplot(aes(x = death_proportion,
y = country,
color = government)) +
geom_point()

I think you just need to use forcats::fct_reorder to set the order of you countries by the plotting variable.
Check this example:
library(tidyverse)
mtcars %>%
rownames_to_column(var = "car_name") %>%
mutate(car_name = fct_reorder(car_name, desc(mpg))) %>%
ggplot(aes(x = mpg,
y = car_name,
color = factor(cyl))) +
geom_point()
Created on 2021-03-16 by the reprex package (v1.0.0)

Related

How to Create A Stacked Column Plot of Multiple Variables in R (ggplot2)

Currently, I have a data frame that looks like this:
Month Total Revenue Dues Total Retail Other Revenue
8/31/2020 36615.00 30825 1200 4590
9/30/2020 38096.69 34322 2779.4 995.29
10/31/2020 43594.15 35936 2074.68 5583.47
11/30/2020 51856.9 43432 993.5 7431.4
I want to create a plot (which I imagine should be a stacked column) in ggplot that shows the revenue mix by type for each month. For my data, Total Revenue is the sum of dues, total retail and other revenue. Dues, total retail and other revenue should stack on top of each other, each having its own colour. I also want labels on the column chart describing what percentage of the total revenue is from each source of income.
I can plot the total revenue with no issues, but I cannot seem to wrap my head around splitting the columns up. My only successful example so far is as follows.
# Create Column Plot of Total Revenue
library(tidyverse)
plot1 <- ggplot(August_Data, aes(Month_End, `Total Revenue`)) + geom_col()
This example obviously does not split up the revenue into the correct subcategories. I thought that using the fill command may work however I face an error.
plot1 <- ggplot(August_Data, aes(Month_End, `Total Revenue`)) + geom_col(aes(fill = C(Dues, `Total Retail`, `Other Revenue`)))
Thank you so much for your help
Update after clarification:
library(tidyverse)
library(lubridate)
df %>%
mutate(Month = mdy(Month)) %>% # this line is not necessary in OPs original code (not the one presented here)
pivot_longer(
cols = c("Dues", "TotalRetail", "OtherRevenue"),
# cols = -c(Month_End, SID) in OPs original code
names_to = "names",
values_to = "values"
) %>%
mutate(percent = values/TotalRevenue*100) %>%
ggplot(aes(x = Month, y= values, fill= names))+
geom_col() +
geom_text(aes(label = paste0(round(percent,1),"%")),
position = position_stack(vjust = 0.5), size = 5)
First answer:
You were almost there. Pivot longer and add fill.
library(tidyverse)
library(lubridate)
df %>%
mutate(Month = mdy(Month)) %>%
pivot_longer(
-Month,
names_to = "names",
values_to = "values"
) %>%
ggplot(aes(x = Month, y= values, fill= names))+
geom_col()

Making the X_axis more visible?

This is the code I used, the goal is to visualize the evolution of covid in north africa
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
This is the dataset I used
as you can see the X_axis is day-to-day therefore it's a bit condensed dataset
And this is the plot
you can't see anything in the X_axis, I want to be able to discern the dates maybe use weeks or months to scale instead of days plot.
r
I converted string columns to date type as the comments suggested and it all worked out
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
mutate(date = as.Date(date))%>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
this is the plot after modification.

Ggplot - How to present the mean of a third varience?

Let's say I have this data frame:
The data frame
I want to make a graph which presents for each SES (Social Economy Status) what is the mean income for females and what is the mean income for males.
I have so far this code:
ggplot(incomeSorted, aes(GENDER)) +
scale_y_continuous("Mean")+
geom_bar(position = "dodge")+
facet_wrap("SES")
and this is the output:
How do I make the graph to present the mean of income instead of counting the number of females and males at each category?
Thanks ahead!
If you want to display mean income, you have to compute it. You can use dplyr and group_by() with summarise() to obtain the key variable and then plot. Here a code for the task:
library(ggplot2)
library(dplyr)
#Data
df <- data.frame(id=1:8,Gender=c(rep('Female',4),rep(c('Male','Female'),2)),
income=c(73,150,220.18,234,314.16,983.1,1001,1012),
SES=c('Bottom','Bottom','Middle','Middle','Middle',
'Upper','Upper','Upper'),
stringsAsFactors = F)
#Compute and plot
df %>% group_by(SES,Gender) %>%
summarise(MeanIncome=mean(income,na.rm=T)) %>%
ggplot(aes(x=Gender,y=MeanIncome)) +
scale_y_continuous("Mean")+
geom_bar(stat = 'identity')+
facet_wrap(.~SES)
Output:
Or you can avoid facets and displaying the plot with a fill variable like this:
#Code 2
df %>% group_by(SES,Gender) %>%
summarise(MeanIncome=mean(income,na.rm=T)) %>%
ggplot(aes(x=Gender,y=MeanIncome,fill=SES)) +
scale_y_continuous("Mean")+
geom_bar(stat = 'identity',position = position_dodge2(0.9,preserve = 'single'))
Output:

bicolor heatmap with factor levels

I have this dataframe:
set.seed(0)
df <- data.frame(id = factor(sample(1:100, 10000, replace=TRUE), levels=1:100),
year = factor(sample(1950:2019, 10000, replace=TRUE), levels=1950:2019)) %>% unique() %>% arrange(id, year)
And I'm looking to plot a heatmap graph where the ids are in the X-axis, years at the Y-axis, and the color is blue when the data point exists and the color is red when the data doesn't exist. I'm almost there, but I can't figure out to change the fill argument for the two colors:
ggplot(df, aes(id, year, fill= year)) +
geom_tile()
The objective to plot both variables as factors is to plot them even when some year doesn't have any id (and plotting its whole row as red).
EDIT:
Two things I forgot to add (hope it's not too late):
How to add alpha transparency to geom_tile() without messing it?
I need to sort the ids from maximum missings to minimum missings.
The complete() function from the tidyr package is useful for filling in missing combinations. First, you need to set a flag variable to indicate if the data is present or not, and then expand the data frame with the missing combinations and fill the new flag variable with 0:
df <- df %>%
mutate(flag = TRUE) %>%
complete(id, year, fill = list(flag = FALSE))
ggplot(df, aes(id, year, fill = flag)) +
geom_tile()
EDIT1: To add transparency, add alpha = 0.x within geom_tile(), where x is a value indicating the transparency. The lower the value, the more transparent.
EDIT2: To sort by missingness add the following code prior to the ggplot code:
# Determine the order of the IDs
df_order <- df %>%
group_by(id) %>%
summarize(sum = sum(flag)) %>%
arrange(desc(sum)) %>%
mutate(order = row_number()) %>%
select(id, order)
# Set the IDs in order on the chart
df <- df %>%
left_join(df_order) %>%
mutate(id = fct_reorder(id, order))
I think you need to do some pre-processing before plotting. Create a temporary variable (data_exist) which denotes data is present for that id and year. Then use complete to fill the missing years for each id and plot it.
library(tidyverse)
df %>%
mutate_all(~as.integer(as.character(.))) %>%
mutate(data_exist = 1) %>%
complete(id, year = min(year):max(year), fill = list(data_exist = 0)) %>%
mutate(data_exist = factor(data_exist)) %>%
ggplot() + aes(id, year, fill= data_exist) + geom_tile()
With expand.gridyou can create a dataframe with all combinations of ids and years, then left join on this combinations to see if you had them in df
all <- expand.grid(id=levels(df$id),year=levels(df$year)) %>%
left_join(df) %>%
mutate(present=ifelse(is.na(present),'0','1'))
ggplot(all, aes(as.numeric(id), as.numeric(year), fill= present)) +
geom_tile() +
scale_fill_manual(values=c('0'='red','1'='blue')) + # change default colors
theme(legend.position="None") # hide legend

ggplot2 bar plot by two groups and mean of y variable

I'm trying to create a bar plot for which I have two groups and the y variable is the mean of one of those groups.
Sample Bar Graph
So looking at the above bar graph in the photo, I have bars grouped by country and prosocial, and on the y-axis I have taken the fraction of prosocial individuals. I am only able, however, to create a bar plot that only takes the mean of prosocial and groups it by country. Basically, it's just one bar per county. Which is not exactly what I'm looking for. So far this is the code I've been using to group the data for the bar plot, which has been somewhat unsuccessful.
plotData <- myData2[!is.na(myData2$prosocial),]
plotData <- plotData %>%
mutate(mean_prosocial = mean(prosocial)) %>%
group_by(country) %>%
summarise(mean_prosocial = mean(prosocial),se = sd(prosocial) / sqrt(n()))
This only groups by country and if I want to group by prosocial as well, I obviously just get NAs for the mean variable. Below is a link to the working data:
workable data.
Thanks.
Say you want to find the fraction of prosocial/non-prosocial across countries:
require(dplyr)
require(ggplot2)
First find how many observations in each country. Later it will be used in fraction calculation.
count_country <- myData2 %>%
filter(!is.na(prosocial)) %>%
group_by(country) %>%
summarise(n = length(country)) %>%
ungroup
Next find the number of prosocial/non-prosocial count across countries.
count_prosocial <- myData2 %>%
filter(!is.na(prosocial)) %>%
group_by(country, prosocial) %>%
summarise(n = length(prosocial)) %>%
mutate(prosocial = as.factor(prosocial))
Merge two dataframes by country name and find the fractions:
df <- count_prosocial %>%
left_join(count_country, by = "country") %>%
mutate(frac = round(n.x / n.y, 2))
Display fractions across different countries using facet_wrap:
ggplot(data=df, aes(x=prosocial, y=frac, fill=prosocial)) +
geom_bar(stat = "identity")+
geom_text(aes(x=prosocial, y=frac, label = frac),
position = position_dodge(width = 1),
vjust = 2, size = 3, color = "white", fontface = "bold")+
facet_wrap(~country)+
labs(y = "Fraction of prosocial/non-prosocial") +
scale_fill_discrete(labels=c("Prosocial", "Individualist"))+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())

Resources