Plot a stacked barplot - amended - r

I have 4 dataframes, which all have a column called Results showing Wins, Draws, Losses. I would like to create a layered histogram as the picture below. Any idea if it is achievable in R?
This is what I was playing with:
ggplot(results, aes(x = Country, y = ??)) +
geom_bar(aes(fill = Performance), stat = "identity")
Problem with this is I don't know what should I set the y axis to be. These are supposed to be counts
Another option I tried which is almost what I want is this:
counts <- table(results$Performance, results$Country)
barplot(counts, main="Game Count per Football Team",
xlab="Football Teams", ylab = "Game Count", col=c("darkblue","red", "Yellow"),
legend = rownames(counts))
Although the y axis stop at 800 although I have 908 observations max in one of the countries

Well, I can give you some code that will show you how you could do this. You basically would just want four different geom_bar statements.
To demonstrate, I'll create two different dataframes from the mpg dataset that comes with the ggplot2 package, because you didn't provide any data.
library(tidyverse)
# I'm making two different data frames from the
# 'mpg' dataset, which comes with the ggplot package
mpg$year = as.character(mpg$year)
df1 = filter(mpg, year == "1999")
df2 = filter(mpg, year == "2008")
plot = ggplot() +
geom_bar(data=df1
, aes(x = year, y = hwy, fill = manufacturer)
, stat = "identity") +
geom_bar(data=df2
, aes(x = year, y = hwy, fill = manufacturer)
, stat = "identity")
print(plot)

Related

How do you create a plot from two different data frames (or how do you combine data frames with identical column names)

I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so
df2019 <- data.frame(Institute = c("A","B","C"),Women = c(65,50,70),Men = c(35,50,30))
df2016 <- data.frame(Institute = c("A","B","C"),Women = c(70,45,50),Men = c(30,55,50))
df2019_melted <- melt(df2019)
ggplot(data = df2019_melted, aes(x = Institute, y = value, fill = variable))+
geom_bar(stat = "identity", position = "dodge")+
labs(fill = "Gender")+
xlab("Institute")+
ylab("Percent")+
scale_fill_discrete(labels = c("Women","Men"))+
ggtitle("Overall Gender Composition 2019")
but I want the plot to show 2016 in faded bars, but grouped the same way as 2019, so 4 bars for each Institute.
Since the column names are the same for all of my dataframes I cant use rbind() or similar since it doesnt differentiate between what dataframe is what when combined.
Add a column for year to your data frames and then combine and melt. ggplot prefers everything to be in one data.frame
all_melted <- reshape2::melt(
rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)),
id=c("year", "Institute"))
Then you can plot with something like this, mapping year to alpha to make "faded" bars
ggplot(all_melted, aes(x = Institute, y = value, fill = variable, alpha=factor(year)))+
geom_col(position = "dodge")+
labs(fill = "Gender")+
xlab("Institute")+
ylab("Percent")+
scale_alpha_discrete(range=c(.4, 1), name="Year") +
ggtitle("Overall Gender Composition")

How to make a dual axis in ggplot R

I have made a time series plot for total count data of 4 different species. As you can see the results with sharksucker have a much higher count than the other 3 species. To see the trends of the other 3 species they need to plotted separately (or on a smaller y axis). However, I have a figure limit in my masters paper. So, I was trying to create a dual axis plot or have the y axis split into two. Does anyone know of a way I could do this?
library(tidyverse)
library(reshape2)
dat <- read_xlsx("ReefPA.xlsx")
dat1 <- dat
dat1$Date <- format(dat1$Date, "%Y/%m")
plot_dat <- dat1 %>%
group_by(Date) %>%
summarise(Sharksucker_Remora = sum(Sharksucker_Remora)) %>%
melt("Date") %>%
filter(Date > '2018-01-01') %>%
arrange(Date)
names(plot_dat) <- c("Date", "Species", "Count")
ggplot(data = plot_dat) +
geom_line(mapping = aes(x = Date, y = Count, group = Species, colour = Species)) +
stat_smooth(method=lm, aes(x = Date, y = Count, group = Species, colour = Species)) +
scale_colour_manual(values=c(Golden_Trevally="goldenrod2", Red_Snapper="firebrick2", Sharksucker_Remora="darkolivegreen3", Juvenile_Remora="aquamarine2")) +
xlab("Date") +
ylab("Total Presence Per Month") +
theme(legend.title = element_blank()) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
The thing is, the problem you're trying to solve doesn't seem like a 2nd Y axis issue. The problem here is of relative scale of the species. You might want to think of something like standardizing the initial species presence to 100 and showing growth or decline from there.
Another option would be faceting by species.

Make different plots for each column in dataframe in one window r

I make for each variable in my dataframe a histogram, lineplot and boxplot to assess the distribution of each variable and plot these graphs in one window.
For variable VARIABLE my code looks like:
variable_name_string = "VARIABLE"
hist = qplot(VARIABLE, data = full_data_noNO, geom="histogram",
fill=I("lightblue"))+
theme_light()
avg_price = full_data_noNO %>%
group_by(Month, Country) %>%
dplyr::summarize(avg = mean(VARIABLE, na.rm =
TRUE))
#line graph for different countries over time
line = ggplot(data=avg_price, aes(x=anydate(Month), y=VARIABLE,
group=Country)) +
xlab("Date")+
ylab(variable_name_string)+
geom_line(aes(color=Country), size = 1)+
theme_light()
#boxplot over different years
avg_price2 = avg_price
avg_price2$Month = format(as.Date(anydate(avg_price$Month), "%Y-%m-%d"),
"%Y")
box = ggplot(avg_price2, aes(x = Month, y=VARIABLE, fill = Month)) +
geom_boxplot()+
xlab("Date")+
ylab(variable_name_string)+
guides(fill=FALSE)+
theme_light()
var_name = grid.text(variable_name_string, gp=gpar(fontsize=20))
#merge plot into one window
grid.arrange(var_name, hist, line, box, ncol=2)
This works fine for one variable, but now I want to do this for every variable in my dataframe and save the merged plot window for all variables. I have been looking for almost the entire day but I cannot find a solution. Can anyone help me?
Without reproducible example it is hard to help, but you could try to wrap your plotting code in a function and use lapply to repeatedly call the function for all your variables.
make_plots <- function (variable_string) {
var_quo <- rlang::sym(variable_string)
hist = qplot(!!var_quo, data = full_data_noNO, geom="histogram",
fill=I("lightblue"))+
theme_light()
avg_price = full_data_noNO %>%
group_by(Month, Country) %>%
dplyr::summarize(avg = mean(!!var_quo, na.rm =
TRUE))
#line graph for different countries over time
line = ggplot(data=avg_price, aes(x=anydate(Month), y=!!var_quo,
group=Country)) +
xlab("Date")+
ylab(variable_string)+
geom_line(aes(color=Country), size = 1)+
theme_light()
#boxplot over different years
avg_price2 = avg_price
avg_price2$Month = format(as.Date(anydate(avg_price$Month), "%Y-%m-%d"),
"%Y")
box = ggplot(avg_price2, aes(x = Month, y=!!var_quo, fill = Month)) +
geom_boxplot()+
xlab("Date")+
ylab(variable_string)+
guides(fill=FALSE)+
theme_light()
var_name = grid.text(!!var_quo, gp=gpar(fontsize=20))
#merge plot into one window
combined <- grid.arrange(var_name, hist, line, box, ncol=2)
# Save combined plot at VARIABLE_plots.pdf
ggsave(paste0(variable_string, "_plots.pdf"), combined)
combined
}
# Make sure to pass the variable names as character vector
plots <- lapply(c("VARIABLE1", "VARIABLE2"), make_plots)
# OR
plots <- lapply(colnames(full_data_noNO), make_plots)
# Plots can also be accessed and printed individually
print(plots[["VARIABLE1"]])

Why is ggplot ignoring my factor levels when I subset my data?

I am using some code I got from an answer to a previous question, but I ran into a funny problem and Id like some expert insight into what is going on. I am trying to plot monthly deviations from an annual mean using bar charts. Specifically I am coloring the different bars different colors depending on whether the monthly mean is above or below the annual mean. I am using the txhousing dataset, which is included with the ggplot2 package.
I thought I could use a factor to denote whether or not this is the case. The months are correctly ordered when I only plot a subset of the data (the "lower" values, but when I add another plot, ggplot rearranges all of the months to be alphabetical. Does anyone know why this happens, and what a workaround would be?
Thank you so much for any input! Criticism of my code is welcome :)
Reproducible Examples
1. Using just one plot
library(tidyverse)
# subset txhousing to just years >= 2011, and calculate nested means and dates
housing_df <- filter(txhousing, year == 2014) %>%
group_by(year, month) %>%
summarise(monthly_mean = mean(sales, na.rm = TRUE),
date = first(date)) %>%
mutate(month = factor(month.abb[month], levels = month.abb, ordered = TRUE),
salesdiff = monthly_mean - mean(monthly_mean), # monthly deviation
higherlower = case_when(salesdiff >= 0 ~ "higher",
salesdiff < 0 ~ "lower"))
ggplot(data = housing_df, aes(x = month, y = salesdiff, higherlower)) +
geom_col(data = filter(housing_df, higherlower == "higher"), aes(y = salesdiff, fill = higherlower)) +
scale_fill_manual(values = c("higher" = "blue", "lower" = "red")) +
theme_bw() +
theme(legend.position = "none") # remove legend
2. Using two plots with all of the data:
ggplot(data = housing_df, aes(x = month, y = salesdiff, higherlower)) +
geom_col(data = filter(housing_df, higherlower == "higher"), aes(y = salesdiff, fill = higherlower)) +
geom_col(data = filter(housing_df, higherlower == "lower"), aes(y = salesdiff, fill = higherlower)) +
scale_fill_manual(values = c("higher" = "blue", "lower" = "red")) +
theme_bw() +
theme(legend.position = "none") # remove legend
There are multiple ways to do this but I find it a bit of a hit and trial. You are already doing the most common fix which is t convert month into a factor and that's why the first plot works. Why does it not work in the 2nd case is a bit of a mystery but try adding + scale_x_discrete(limits= housing_df$month) to override the x axis order and see if that works.
I agree to the other comments that the best way would be not even use the extra layer as its not needed in this specific case but the above solution works even when there are multiple layers.
Additionally, + scale_x_discrete(drop = FALSE) also overrides potentially different factor levels from different data sources in your ggplot.
This topic is also addressed here: https://github.com/tidyverse/ggplot2/issues/577

How to graph "before and after" measures using ggplot with connecting lines and subsets?

I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
show each individual observation separately (now same values are overlapping)
connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package
Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:
library(ggplot)
library(dplyr)
#library(ggthemes)
df <- df %>%
group_by(x, id, class) %>%
summarize(y = median(y, na.rm = T)) %>%
ungroup() %>%
mutate(
id = factor(id),
x = factor(x, labels = c("before", "after")),
class = factor(class, labels = c("one day", "multiple days")),
) %>%
group_by(id) %>%
mutate(nas = any(is.na(y))) %>%
ungroup() %>%
filter(!nas) %>%
select(-nas)
ggplot(df, aes(x = x, y = y, col = id, group = id)) +
geom_point(aes(shape = class)) +
geom_line(show.legend = F) +
#theme_few() +
#theme(legend.position = "none") +
ylab("Feelings of peace, %") +
xlab("")
Here's one possible solution for you.
First - to get the color and shapes determined by variables, you need to put these into the aes function. I turned several into factors, so the labs function fixes the labels so they don't appear as "factor(x)" but just "x".
To address multiple points, one solution is to use geom_smooth with method = "lm". This plots the regression line, instead of connecting all the dots.
The option se = FALSE prevents confidence intervals from being plotted - I don't think they add a lot to your plot, but play with it.
Connecting the dots is done by geom_line - feel free to try that as well.
Within geom_point, the option position = position_jitter(width = .1) adds random noise to the x-axis so points do not overlap.
ggplot(df, aes(x=factor(x), y=y, color=factor(id), shape=factor(class), group = id)) +
geom_point(position = position_jitter(width = .1)) +
geom_smooth(method = 'lm', se = FALSE) +
labs(
x = "x",
color = "ID",
shape = 'Class'
)

Resources