Highlight top 3 values - r

I have a plot like this with the following code:
aus_cases <- ggplot(data = daily_cases,aes(x= date, as.numeric(V1)))+
geom_col(fill = 'blue', alpha= 0.6)+
theme_minimal(base_size =14)+
xlab(NULL)+
ylab(NULL)+
theme_bw()+
scale_x_date(date_labels = "%d/%m/%Y")
And I wanted to highlight the top 3 value within the plot and show the date in the plot as well, was thinking to use gghighlight but am not sure how to do it.

Using the ggplot2::economics dataset as example data you can highlight and label the top 3 values like so:
Add an indicator for the top3 values to your df using e.g. the rank function.
To highlight, map the top3 indicator on fill.
To add the dates use geom_text to add labels only for the top3 values
Try this:
library(ggplot2)
library(dplyr)
# Example data
d <- filter(economics, date >= as.Date("2010-01-01"))
# Add top3 indicator
d <- mutate(d, top3 = rank(-psavert) %in% 1:3)
ggplot(data = d, aes(date, psavert, fill = top3)) +
geom_col(alpha = 0.6) +
geom_text(aes(label = ifelse(top3, as.character(date), "")), nudge_y = .1) +
scale_fill_manual(values = c("TRUE" = "red", "FALSE" = "blue")) +
theme_minimal(base_size = 14) +
xlab(NULL) +
ylab(NULL) +
theme_bw() +
scale_x_date(date_labels = "%d/%m/%Y")

Here is one way to do it. You didn't dputyour data, therefore I used this test data.
library(lubridate)
library(tidyverse)
library(gghighlight)
daily_cases <- data.frame(V1 = c(10,20,30, 10, 5, 10, 10, 40, 50, 10),
date = ymd("2020-02-01", "2020-02-02",
"2020-02-03","2020-02-04",
"2020-02-05","2020-02-06",
"2020-02-07","2020-02-08",
"2020-02-09","2020-02-10"))
At first I specified the top 3 values and their date in top. And used these information in ggplot in gghighlight (highlighting the three bars) and scale_x_date (just show the dates of the highlightes bars).
top <- daily_cases %>%
arrange(desc(V1)) %>%
slice(1:3)
aus_cases <- ggplot(data = daily_cases,aes(x= date, as.numeric(V1)))+
geom_col(fill = 'blue', alpha= 0.6)+
gghighlight(V1 >= min(top$V1)) +
theme_minimal(base_size = 14)+
xlab(NULL)+
ylab(NULL)+
theme_bw()+
scale_x_date(breaks = top$date, date_labels = "%d/%m/%Y")
Here is the plot.

Related

How to change color of moving averages in ggplot, plotting two series into one graph?

In order to highlight the moving average in my ggplot visualization, I want to give it a different color (in this case grey or black for both MA lines). When it comes to to a graph representing two time series, I struggle to find the best solution. Maybe I need to take a different approach.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidyquant))
V = 365
data <- data.frame (var1 = c(rnorm(V)),
var2 = c(rnorm(V)+12),
date = c(dates <- ymd("2013-01-01")+ days(0:364))
)
data_melted <- reshape2::melt(data, id.var='date')
data_melted %>%
ggplot() +
geom_line(mapping = aes(x= date, y=value, col=variable)) +
scale_color_manual(values=c("#CC6666", "steelblue")) +
geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, col=variable)) +
theme(axis.text.x = element_text(angle = 50, vjust = 0.5)) +
scale_x_date(date_breaks = "1 month")
I think you can get what you want by not mapping variable to color in aes() for the MA part. I instead include group = variable to tell ggplot2 that the two MA's should be separate series, but no difference in their color based on that.
data_melted %>%
ggplot() +
geom_line(mapping = aes(x= date, y=value, col=variable)) +
scale_color_manual(values=c("#CC6666", "steelblue")) +
tidyquant::geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, group = variable), color = "black") +
theme(axis.text.x = element_text(angle = 50, vjust = 0.5)) +
scale_x_date(date_breaks = "1 month")
If you want different colors, the natural way to do this in ggplot would be to give the moving averages their own values to be mapped to color.
...
scale_color_manual(values=c("#CC6666", "#996666", "steelblue", "slateblue")) +
tidyquant::geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, col=paste(variable, "MA"))) +
...
I would consider looking at the tsibble library for time series data.
library(tsibble)
data_melted <-as_tsibble(data_melted, key = 'variable', index = 'date')
data_melted <- data_melted %>%
mutate(
`5-MA` = slider::slide_dbl(value, mean,
.before = 2, .after = 2, .complete = TRUE)
)
data_melted %>%
filter(variable == "var1") %>%
autoplot(value) +
geom_line(aes(y = `5-MA`), colour = "#D55E00") +
labs(y = "y",
title = "title") +
guides(colour = guide_legend(title = "series"))

Split barplot using position = "fill" into separate facets

I created a bar graph in ggplot using stat = "count" and position = "fill" to show the proportional occurrence of each feature per year (below). I find the readability of this graph rather poor and therefore I'd like to split the graph into facets. However, if I add facet_wrap(~Features), it just fills the bars in every separate facet. How can I prevent this from happening?
The code for my original graph is:
data %>% ggplot(aes(x = Year, fill = Features)) + geom_bar(stat = "count", position = "fill") + theme_classic() + theme(axis.text.x = element_text(angle = 90)) + scale_y_continuous(labels = scales::percent)
I've tried:
data %>% ggplot(aes(x = Year)) + stat_count(geom = "bar", aes(y = ..prop..)) + facet_wrap(~Features) + theme_classic() + theme(axis.text.x = element_text(angle = 90))
but this calculates the proportion within the facet rather than within each year.
Any ideas how I can solve this (using ggplot, rather than by restructuring my data)?
A little about my data:
I have a data frame of features (factor) with for each feature the year (factor) in which this feature was observed. The same feature can occur several times per year, so there are several rows with the same entry for year and feature.
This should work. First, I'll make some data that has similar properties:
labs <- c("Digital labels", "Produce ID (barcode)",
"Smart labels", "Product Recommendation",
"Shopping list", "Product Browsing",
"Product ID (computer vision)",
"Navigation (in-store)", "Product ID (RFID)",
"Other")
years <- vector(mode="list", length=13)
years[[1]] <- c(1,2)
years[[2]] <- c(1,2,8)
years[[3]] <- c(1,2,4,10)
years[[4]] <- c(1,2,3,4,5,6,8,9,10)
years[[5]] <- c(2,3,4,5,8,10)
years[[6]] <- c(1:6, 10)
years[[7]] <- c(1:6, 10)
years[[8]] <- 1:10
years[[9]] <- c(1,3,6,9,10)
years[[10]] <- c(1:5, 7,9,10)
years[[11]] <- 1:10
years[[12]] <- c(1:6, 8:10)
years[[13]] <- c(1,2,3,6,8,9,10)
y <- 2008:2020
dat <- NULL
for(i in 1:13){
tmp <- tibble(
Features = sample(years[[i]], runif(1,600,1000), replace=TRUE),
Year = y[i]
) %>%
mutate(Features = factor(Features, levels=1:10, labels=labs))
dat <- rbind(dat, tmp)
}
Next, here's the original plot like the one you made initially.
dat %>%
ggplot(aes(x = Year, fill = Features)) +
geom_bar(stat = "count", position = "fill") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
scale_y_continuous(labels = scales::percent)
And here's how that would translate into different facets. The key is to make the percentages by hand first and then plot them directly.
agdat %>% filter(Features != "Other") %>%
ggplot(aes(x=Year, y=pct)) +
geom_bar(stat="identity") +
facet_wrap(~Features, ncol=3) +
labs(x="Year", y="Percent") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
scale_y_continuous(labels = scales::percent)

Plot labels at ends of lines in stacked area chart

I have the following code
library(ggplot2)
library(dplyr)
# create data
time <- as.numeric(rep(seq(1,7),each=7)) # x Axis
value <- runif(49, 10, 100) # y Axis
group <- rep(LETTERS[1:7],times=7) # group, one shape per group
data <- data.frame(time, value, group)
# stacked area chart
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area()+
geom_text(data = data %>% filter(time == last(time)), aes(label = group,
x = time + 0.5,
y = value,
color = group)) +
guides(color = FALSE) + theme_bw() +
scale_x_continuous(breaks = scales::pretty_breaks(10))
Where i get
But i am aiming for link
Is there any solution for stacked area plot?
The question code is plotting the text labels in the value's of the last time, when in fact the areas are cumulative. And in reverse order.
Also, the following graph plots data created with the same code but with
set.seed(1234)
Then the data creation code is the same as in the question.
# stacked area chart
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area()+
geom_text(data = data %>%
filter(time == last(time)) %>%
mutate(value = cumsum(rev(value))),
aes(label = rev(group),
x = time + 0.5,
y = value,
color = rev(group))) +
guides(color = FALSE) + theme_bw() +
scale_x_continuous(breaks = scales::pretty_breaks(10))
Edit.
Following the discussion in the comments to this answer, I have decided to post code based on the comment by user Jake Kaupp.
ggplot(data, aes(x = time, y = value, fill = group)) +
geom_area()+
geom_text(data = data %>% filter(time == last(time)),
aes(x = time + 0.5, y = value,
label = rev(group), color = rev(group)),
position = position_stack(vjust = 0.5)) +
guides(color = FALSE) +
theme_bw() +
scale_x_continuous(breaks = scales::pretty_breaks(10))
You can use the text function to put text wherever you want. For example:
text(7.2, 350, "B", col="brown")
Here we go
time <- as.numeric(rep(seq(1,7),each=8)) # x Axis
value <- runif(56, 10, 100) # y Axis
group <- rep(LETTERS[1:8],times=7) # group, one shape per group
data <- data.frame(time, value, group)
round_df <- function(x, digits) {
# round all numeric variables
# x: data frame
# digits: number of digits to round
numeric_columns <- sapply(x, mode) == 'numeric'
x[numeric_columns] <- round(x[numeric_columns], digits)
x
}
data$value<- round_df(data$value, 2)
# stacked area chart
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area()+
geom_text(aes(x = time + 0.5, y = value, label=ifelse(time == max(time), group, NA)),position = position_stack(vjust = 0.5),check_overlap = TRUE)+
guides(color = FALSE) + theme_bw()+
scale_x_continuous(breaks = scales::pretty_breaks(10)) +
geom_text(aes(label=ifelse(time != min(time) & time != max(time),value, NA)),position = position_stack(vjust = 0.5),check_overlap = TRUE)+
geom_text(aes(x = time + 0.18,label=ifelse(time == min(time),value, NA)),position = position_stack(vjust = 0.5),check_overlap = TRUE)+
geom_text(aes(x = time - 0.18,label=ifelse(time == max(time),value, NA)),position = position_stack(vjust = 0.5),check_overlap = TRUE)
And get
Factor levels but why not letters? That is the next step :)
UPDATED
just converted factor to char data$group <- as.character(data$group)

Raster-like timeseries graph in ggplot2

I'm trying to recreate a graph like the one here using ggplot2.
I can get pretty close if I mess around with the size and shape of points using coord_equal, but...
Example data and code
library(ggplot2)
df <- data.frame()
Years <- 1990:2020
for(i in 1:length(Years)) {
Year <- Years[i]
week <-1:52
value <- sort(round(rnorm(52, 50, 30), 0))
df.small <- data.frame(Year = Year, week = week, value = value)
df <- bind_rows(df, df.small)
}
ggplot(df, aes(week, Year, color = value)) +
geom_point(shape = 15, size = 2.7) +
scale_color_gradientn(colours = rainbow(10)) +
coord_equal()
The problem is,
with my real data I want to "stretch" the graph so I can see it more clearly (my timeseries is shorter) and when I don't use coord_equal, squares don't fill the graphing area:
ggplot(df, aes(week, Year, color = value)) +
geom_point(shape = 15, size = 2.7) +
scale_color_gradientn(colours = rainbow(10))
Is this as simple as using the geom_raster geom?
ggplot(df, aes(week, Year)) +
geom_raster(aes(fill = value)) +
scale_fill_gradientn(colours = rainbow(10)) +
coord_equal()

Plotting a Chart when Data isn't there

I'm just trying to plot monthly data when there is no data for two months. My code is below, where the data for Months April and May is empty. However, when I try to plot the chart, I get an error
Error: (converted from warning) Removed 2 rows containing non-finite values (stat_boxplot).
If I just add a zero in each entry (instead of NA) I get the plot, but now with a value when there shouldn't be any (see https://i.stack.imgur.com/CHI51.png). If I add na.omit(df) it just removes the two months. Could someone assist me please?
Cost_Delta<-c(85000,-32672.62,28335.64,-85000,30963.5,-28335.64,NA,NA,
-85000,32672.62,85000,-32672.62,-85000,-32672.62,85000,
-32672.62,-85000,32672.62,85000,32672.62,-85000,-32672.62)
Month<-c("Jan","Jan","Feb","Feb","Mar","Mar","Apr","May","Jun","Jun",
"Jul","Jul","Aug","Aug","Sep","Sep","Oct","Oct","Nov","Nov","Dec","Dec")
df<-data.frame(Cost_Delta,Month)
df$Month <- as.character(df$Month)
df$Month <- factor(df$Month, levels=unique(df$Month))
library(ggplot2)
p<-ggplot(df, aes(x=Month, y=Cost_Delta)) +
geom_point(aes(fill=Month), size=2, shape=21, colour="grey20",
position=position_jitter(width=0.2, height=0.1)) +
geom_boxplot(outlier.colour=NA, fill=NA, colour="grey20") +
scale_y_continuous(labels=scales::comma,breaks=seq(-300000,400000,50000)) +
labs(x="Month-Year", y="Cost Delta (Demand-Mean Forecast)")
p
ggplot2 warns the user when they try to plot NA values. If you want to explicitly ignore this behavior, you can use the argument na.rm = TRUE with the layer.
Cost_Delta<-c(85000,-32672.62,28335.64,-85000,30963.5,-28335.64,NA,NA,
-85000,32672.62,85000,-32672.62,-85000,-32672.62,85000,
-32672.62,-85000,32672.62,85000,32672.62,-85000,-32672.62)
Month<-c("Jan","Jan","Feb","Feb","Mar","Mar","Apr","May","Jun","Jun",
"Jul","Jul","Aug","Aug","Sep","Sep","Oct","Oct","Nov","Nov","Dec","Dec")
df <- data.frame(Cost_Delta, Month)
df$Month <- as.character(df$Month)
df$Month <- factor(df$Month, levels = unique(df$Month))
library(ggplot2)
p <- ggplot(df, aes(x = Month, y = Cost_Delta)) +
geom_point(
aes(fill = Month),
size = 2,
shape = 21,
colour = "grey20",
position = position_jitter(width = 0.2, height = 0.1),
na.rm = TRUE
) +
geom_boxplot(outlier.colour = NA,
fill = NA,
colour = "grey20",
na.rm = TRUE) +
scale_y_continuous(label = scales::comma, breaks = seq(-300000, 400000, 50000)) +
labs(x = "Month-Year", y = "Cost Delta (Demand-Mean Forecast)")
p

Resources