Split barplot using position = "fill" into separate facets - r

I created a bar graph in ggplot using stat = "count" and position = "fill" to show the proportional occurrence of each feature per year (below). I find the readability of this graph rather poor and therefore I'd like to split the graph into facets. However, if I add facet_wrap(~Features), it just fills the bars in every separate facet. How can I prevent this from happening?
The code for my original graph is:
data %>% ggplot(aes(x = Year, fill = Features)) + geom_bar(stat = "count", position = "fill") + theme_classic() + theme(axis.text.x = element_text(angle = 90)) + scale_y_continuous(labels = scales::percent)
I've tried:
data %>% ggplot(aes(x = Year)) + stat_count(geom = "bar", aes(y = ..prop..)) + facet_wrap(~Features) + theme_classic() + theme(axis.text.x = element_text(angle = 90))
but this calculates the proportion within the facet rather than within each year.
Any ideas how I can solve this (using ggplot, rather than by restructuring my data)?
A little about my data:
I have a data frame of features (factor) with for each feature the year (factor) in which this feature was observed. The same feature can occur several times per year, so there are several rows with the same entry for year and feature.

This should work. First, I'll make some data that has similar properties:
labs <- c("Digital labels", "Produce ID (barcode)",
"Smart labels", "Product Recommendation",
"Shopping list", "Product Browsing",
"Product ID (computer vision)",
"Navigation (in-store)", "Product ID (RFID)",
"Other")
years <- vector(mode="list", length=13)
years[[1]] <- c(1,2)
years[[2]] <- c(1,2,8)
years[[3]] <- c(1,2,4,10)
years[[4]] <- c(1,2,3,4,5,6,8,9,10)
years[[5]] <- c(2,3,4,5,8,10)
years[[6]] <- c(1:6, 10)
years[[7]] <- c(1:6, 10)
years[[8]] <- 1:10
years[[9]] <- c(1,3,6,9,10)
years[[10]] <- c(1:5, 7,9,10)
years[[11]] <- 1:10
years[[12]] <- c(1:6, 8:10)
years[[13]] <- c(1,2,3,6,8,9,10)
y <- 2008:2020
dat <- NULL
for(i in 1:13){
tmp <- tibble(
Features = sample(years[[i]], runif(1,600,1000), replace=TRUE),
Year = y[i]
) %>%
mutate(Features = factor(Features, levels=1:10, labels=labs))
dat <- rbind(dat, tmp)
}
Next, here's the original plot like the one you made initially.
dat %>%
ggplot(aes(x = Year, fill = Features)) +
geom_bar(stat = "count", position = "fill") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
scale_y_continuous(labels = scales::percent)
And here's how that would translate into different facets. The key is to make the percentages by hand first and then plot them directly.
agdat %>% filter(Features != "Other") %>%
ggplot(aes(x=Year, y=pct)) +
geom_bar(stat="identity") +
facet_wrap(~Features, ncol=3) +
labs(x="Year", y="Percent") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
scale_y_continuous(labels = scales::percent)

Related

How to change color of moving averages in ggplot, plotting two series into one graph?

In order to highlight the moving average in my ggplot visualization, I want to give it a different color (in this case grey or black for both MA lines). When it comes to to a graph representing two time series, I struggle to find the best solution. Maybe I need to take a different approach.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidyquant))
V = 365
data <- data.frame (var1 = c(rnorm(V)),
var2 = c(rnorm(V)+12),
date = c(dates <- ymd("2013-01-01")+ days(0:364))
)
data_melted <- reshape2::melt(data, id.var='date')
data_melted %>%
ggplot() +
geom_line(mapping = aes(x= date, y=value, col=variable)) +
scale_color_manual(values=c("#CC6666", "steelblue")) +
geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, col=variable)) +
theme(axis.text.x = element_text(angle = 50, vjust = 0.5)) +
scale_x_date(date_breaks = "1 month")
I think you can get what you want by not mapping variable to color in aes() for the MA part. I instead include group = variable to tell ggplot2 that the two MA's should be separate series, but no difference in their color based on that.
data_melted %>%
ggplot() +
geom_line(mapping = aes(x= date, y=value, col=variable)) +
scale_color_manual(values=c("#CC6666", "steelblue")) +
tidyquant::geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, group = variable), color = "black") +
theme(axis.text.x = element_text(angle = 50, vjust = 0.5)) +
scale_x_date(date_breaks = "1 month")
If you want different colors, the natural way to do this in ggplot would be to give the moving averages their own values to be mapped to color.
...
scale_color_manual(values=c("#CC6666", "#996666", "steelblue", "slateblue")) +
tidyquant::geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, col=paste(variable, "MA"))) +
...
I would consider looking at the tsibble library for time series data.
library(tsibble)
data_melted <-as_tsibble(data_melted, key = 'variable', index = 'date')
data_melted <- data_melted %>%
mutate(
`5-MA` = slider::slide_dbl(value, mean,
.before = 2, .after = 2, .complete = TRUE)
)
data_melted %>%
filter(variable == "var1") %>%
autoplot(value) +
geom_line(aes(y = `5-MA`), colour = "#D55E00") +
labs(y = "y",
title = "title") +
guides(colour = guide_legend(title = "series"))

ggplot2: Keep x-axis labels in non-alphabetical order

I'm using ggplot2 to plot the annual occurrence of events in states. I want the state labels to be in the same order as shown in the data table "AZ CT NH NM DE..." but ggplot automatically reorganizes the state labels in alphabetical order "AZ CT DE NH...". I created groups so I could display ranges in "num" values (ex. NM and TN). Please ignore the group numbering--I took out some data points to make the table smaller.
ggplot(guidelines, aes(x = state, y = num, group = grp)) +
geom_point() + geom_line(linetype = "dotted") +
labs(x = "State", y = "Number") +
labs(title = "A") +
scale_y_continuous(breaks = seq(0, 11, 1),
limits=c(0,11))
I have tried the suggestions of previous posts to use factor and levels like so:
guidelines$state <- factor(guidelines$state, levels = unique(guidelines$state)
But it does not work because I am using groups and repeating state names. Any ideas on how to get around this?
We can use ordered
library(dplyr)
library(ggplot2)
guidelines %>%
mutate(state =ordered(state, levels = unique(state))) %>%
ggplot(aes(x = state, y = num, group = grp)) +
geom_point() +
geom_line(linetype = "dotted") +
labs(x = "State", y = "Number") +
labs(title = "A") +
scale_y_continuous(breaks = seq(0, 11, 1),
limits=c(0,11))
-output
Try this. You were close in that you must use unique(). Adding ordered=T inside the factor() will keep the desired order. Here the code (Please next time share your data using dput() as sometimes it can be complex to use data from screenshots in they are really big):
library(ggplot2)
#Data
guidelines <- data.frame(state=c('AZ','CT','NH','NM','NM','DE','NJ','TN','TN'),
num=c(10,10,10,5,10,5,5,2,5),
grp=c(3,4,17,19,19,5,18,25,25),stringsAsFactors = F)
#Format factor
guidelines$state <- factor(guidelines$state,levels = unique(guidelines$state),ordered = T)
#Plot
ggplot(guidelines, aes(x = state, y = num, group = grp)) +
geom_point() + geom_line(linetype = "dotted") +
labs(x = "State", y = "Number") +
labs(title = "A") +
scale_y_continuous(breaks = seq(0, 11, 1),
limits=c(0,11))
Output:
Or as mentioned in comments by #TTS you can use this the scale_x_discrete() with limits option:
#Data
guidelines <- data.frame(state=c('AZ','CT','NH','NM','NM','DE','NJ','TN','TN'),
num=c(10,10,10,5,10,5,5,2,5),
grp=c(3,4,17,19,19,5,18,25,25),stringsAsFactors = F)
#Plot 2
ggplot(guidelines, aes(x = state, y = num, group = grp)) +
geom_point() + geom_line(linetype = "dotted") +
labs(x = "State", y = "Number") +
labs(title = "A") +
scale_y_continuous(breaks = seq(0, 11, 1),
limits=c(0,11))+
scale_x_discrete(limits=unique(guidelines$state))
Output:

Highlight top 3 values

I have a plot like this with the following code:
aus_cases <- ggplot(data = daily_cases,aes(x= date, as.numeric(V1)))+
geom_col(fill = 'blue', alpha= 0.6)+
theme_minimal(base_size =14)+
xlab(NULL)+
ylab(NULL)+
theme_bw()+
scale_x_date(date_labels = "%d/%m/%Y")
And I wanted to highlight the top 3 value within the plot and show the date in the plot as well, was thinking to use gghighlight but am not sure how to do it.
Using the ggplot2::economics dataset as example data you can highlight and label the top 3 values like so:
Add an indicator for the top3 values to your df using e.g. the rank function.
To highlight, map the top3 indicator on fill.
To add the dates use geom_text to add labels only for the top3 values
Try this:
library(ggplot2)
library(dplyr)
# Example data
d <- filter(economics, date >= as.Date("2010-01-01"))
# Add top3 indicator
d <- mutate(d, top3 = rank(-psavert) %in% 1:3)
ggplot(data = d, aes(date, psavert, fill = top3)) +
geom_col(alpha = 0.6) +
geom_text(aes(label = ifelse(top3, as.character(date), "")), nudge_y = .1) +
scale_fill_manual(values = c("TRUE" = "red", "FALSE" = "blue")) +
theme_minimal(base_size = 14) +
xlab(NULL) +
ylab(NULL) +
theme_bw() +
scale_x_date(date_labels = "%d/%m/%Y")
Here is one way to do it. You didn't dputyour data, therefore I used this test data.
library(lubridate)
library(tidyverse)
library(gghighlight)
daily_cases <- data.frame(V1 = c(10,20,30, 10, 5, 10, 10, 40, 50, 10),
date = ymd("2020-02-01", "2020-02-02",
"2020-02-03","2020-02-04",
"2020-02-05","2020-02-06",
"2020-02-07","2020-02-08",
"2020-02-09","2020-02-10"))
At first I specified the top 3 values and their date in top. And used these information in ggplot in gghighlight (highlighting the three bars) and scale_x_date (just show the dates of the highlightes bars).
top <- daily_cases %>%
arrange(desc(V1)) %>%
slice(1:3)
aus_cases <- ggplot(data = daily_cases,aes(x= date, as.numeric(V1)))+
geom_col(fill = 'blue', alpha= 0.6)+
gghighlight(V1 >= min(top$V1)) +
theme_minimal(base_size = 14)+
xlab(NULL)+
ylab(NULL)+
theme_bw()+
scale_x_date(breaks = top$date, date_labels = "%d/%m/%Y")
Here is the plot.

Plot labels at ends of lines in stacked area chart

I have the following code
library(ggplot2)
library(dplyr)
# create data
time <- as.numeric(rep(seq(1,7),each=7)) # x Axis
value <- runif(49, 10, 100) # y Axis
group <- rep(LETTERS[1:7],times=7) # group, one shape per group
data <- data.frame(time, value, group)
# stacked area chart
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area()+
geom_text(data = data %>% filter(time == last(time)), aes(label = group,
x = time + 0.5,
y = value,
color = group)) +
guides(color = FALSE) + theme_bw() +
scale_x_continuous(breaks = scales::pretty_breaks(10))
Where i get
But i am aiming for link
Is there any solution for stacked area plot?
The question code is plotting the text labels in the value's of the last time, when in fact the areas are cumulative. And in reverse order.
Also, the following graph plots data created with the same code but with
set.seed(1234)
Then the data creation code is the same as in the question.
# stacked area chart
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area()+
geom_text(data = data %>%
filter(time == last(time)) %>%
mutate(value = cumsum(rev(value))),
aes(label = rev(group),
x = time + 0.5,
y = value,
color = rev(group))) +
guides(color = FALSE) + theme_bw() +
scale_x_continuous(breaks = scales::pretty_breaks(10))
Edit.
Following the discussion in the comments to this answer, I have decided to post code based on the comment by user Jake Kaupp.
ggplot(data, aes(x = time, y = value, fill = group)) +
geom_area()+
geom_text(data = data %>% filter(time == last(time)),
aes(x = time + 0.5, y = value,
label = rev(group), color = rev(group)),
position = position_stack(vjust = 0.5)) +
guides(color = FALSE) +
theme_bw() +
scale_x_continuous(breaks = scales::pretty_breaks(10))
You can use the text function to put text wherever you want. For example:
text(7.2, 350, "B", col="brown")
Here we go
time <- as.numeric(rep(seq(1,7),each=8)) # x Axis
value <- runif(56, 10, 100) # y Axis
group <- rep(LETTERS[1:8],times=7) # group, one shape per group
data <- data.frame(time, value, group)
round_df <- function(x, digits) {
# round all numeric variables
# x: data frame
# digits: number of digits to round
numeric_columns <- sapply(x, mode) == 'numeric'
x[numeric_columns] <- round(x[numeric_columns], digits)
x
}
data$value<- round_df(data$value, 2)
# stacked area chart
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area()+
geom_text(aes(x = time + 0.5, y = value, label=ifelse(time == max(time), group, NA)),position = position_stack(vjust = 0.5),check_overlap = TRUE)+
guides(color = FALSE) + theme_bw()+
scale_x_continuous(breaks = scales::pretty_breaks(10)) +
geom_text(aes(label=ifelse(time != min(time) & time != max(time),value, NA)),position = position_stack(vjust = 0.5),check_overlap = TRUE)+
geom_text(aes(x = time + 0.18,label=ifelse(time == min(time),value, NA)),position = position_stack(vjust = 0.5),check_overlap = TRUE)+
geom_text(aes(x = time - 0.18,label=ifelse(time == max(time),value, NA)),position = position_stack(vjust = 0.5),check_overlap = TRUE)
And get
Factor levels but why not letters? That is the next step :)
UPDATED
just converted factor to char data$group <- as.character(data$group)

Shapes on line graph using stat_summary

I'm sure the answer is very simple but at the moment it eludes me. I want to make a line graph using stat_summary(), with different shapes for each group (representing an experimental condition) at each x-axis tick (representing a separate time point).
Here's the data
set.seed(124)
ID <- rep(1:12, times = 3)
Group <- rep(c("A", "B", "C"), times = 12)
score <- rnorm(36, 25, 3)
session <- rep(c("s1","s2", "s3"), each = 12)
df <- data.frame(ID, Group, session, score)
Now I can get there by making a table of means for each time point. Like so.
gMeans <- aggregate(score ~ session + Group, data = df, mean)
And then graphing it like so.
pMeans <- ggplot(data = gMeans, aes(x = session, y = score, group = Group, shape = Group)) +
geom_line(aes(linetype = Group), size = 1) +
geom_point(size = 5, fill = "white") +
scale_color_hue(name = "Group", l = 30) +
scale_shape_manual(name = "Group", values = c(23,22, 21)) +
scale_linetype_discrete(name = "Group") +
theme_bw()
pMeans
However I would like to be able to skip the step of having to make the table of means by using stat_summary(). I can get a similar graph with different line types, but I can't work out how to get the different shapes on each axis tick for each group. I tried the code below and many different permutations of geom_point() and geom_line(), but to no avail. How do I alter the code below to get output that looks like the output derived from the code above?
pline <- ggplot(df, aes(x=session, y=score, group = Group, shape = Group)) +
stat_summary(fun.y="mean", geom="line", size=1.1, aes(linetype=Group, shape = Group)) +
scale_shape_manual(values=c(1:3))
pline
This should help and also clean up the legend:
library(ggplot2)
set.seed(124)
ID <- rep(1:12, times = 3)
Group <- rep(c("A", "B", "C"), times = 12)
score <- rnorm(36, 25, 3)
session <- rep(c("s1","s2", "s3"), each = 12)
df <- data.frame(ID, Group, session, score)
gg <- ggplot(df, aes(x=session, y=score, group = Group, shape = Group))
gg <- gg + stat_summary(fun.y="mean", geom="line", size=1.1,
aes(linetype = Group), show.legend=FALSE)
gg <- gg + stat_summary(fun.y="mean", geom="point", size=5,
aes(shape = Group), fill="white")
gg <- gg + scale_shape_manual(name = "Group", values = c(23, 22, 21))
gg <- gg + theme_bw()
gg <- gg + theme(legend.key=element_blank())
gg
The lines were obscured, so it makes little sense to keep them in the legend. Since you used stat_summary() for the line (vs geom_line() with an embedded stat="summary" it's best to keep the idiom for the point geom as well IMO).

Resources