How to graph a seasonality plot? - r

I want to build a graph to determine the seasonality of the influenza virus from January to December and group it per year (2011-2018), but I have problems with my results.
I would like graphics like the one that appear in this publication : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193263
outcome<- factor(c(rep("positive", 60)))
month<- sample(1:12,60,replace=T)
year <- sample(2011:2018,60,replace=T)
data<- data.frame(outcome, month, year)
ggplot(data, aes(x=month, y= frequency(data$outcome),
group = year, fill = year)) + geom_col(position = "dodge")

library(dplyr)
data %>%
count(year, month , wt = outcome == "positive") %>%
ggplot(aes(x=month, y= n, group = year, fill = year)) +
geom_col(position = "dodge") +
scale_x_continuous(breaks = 1:12, labels = month.abb)

Assuming you mean the first graphic in the paper the easiest way to do this is a facet_wrap or facet_grid.
Edit- I would also like to add that R does not do block comments like *** if that's what you were trying.

Related

Why can't I get the right horizontal axis labels on my ggplot2 chart?

I am trying to do a faceted plot of a grouped dataframe with ggplot2, using geom_line(). My dataframe has a Date column and I would like to have dates on the horizontal axis. If I just use Date in aes(x=Date, ...) I get nice labels on the horizontal axis. However, the line has an almost horizontal section where the date jumps from the end of one group to the beginning of the next group. This code and chart shows that:
dts <- seq.Date(as.Date("2020-01-01"), as.Date("2021-12-31"), by="day")
mos <- sapply(dts, month)
df <- data.frame(Date=dts, Month=mos)
nr <- nrow(df)
df$X <- rep(1, nr)
df %>%
group_by(Month) -> dfgrp
dfgrp %>%
group_by(Month) %>%
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=Date, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I would not like my chart to have those almost-horizontal lines when the date changes by a large amount. I was able to generate a chart without those lines using integers on aes() as follows:
dfgrp %>%
mutate(Time = 1:n() %>% as.integer(),
Z = cumsum(X)) %>%
ggplot(aes(x=Time, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
scale_x_continuous(breaks = seq(from=1, to=nr, by=10) %>% as.integer(),
labels = function(x) as.character(dfgrp$Date[x])) +
theme(axis.text.x = element_text(angle=45, size=7))
The line on the chart looks like I want it but the dates on the horizontal axis are not correct: they end in February 2020 in every facet while the dates in the dataframe end in December 2021 and the dates in the first chart begin and end on different months in different facets.
I tried many things but nothing worked. Any suggestions on how to have a chart with dates like in the first chart above and lines like in the second chart above?
Help will be much appreciated.
You may want to adjust the dates to be in the same year, but noting the original year as a variable:
library(lubridate)
dfgrp %>%
group_by(Month) %>%
mutate(year = year(Date),
adj_date = ymd(paste(2020, month(Date), day(Date)))) %>%
# 2020 was leap year so 2/29 won't be lost
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=adj_date, y=Z, color = year, group = year)) +
geom_line(size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))

How to make a dual axis in ggplot R

I have made a time series plot for total count data of 4 different species. As you can see the results with sharksucker have a much higher count than the other 3 species. To see the trends of the other 3 species they need to plotted separately (or on a smaller y axis). However, I have a figure limit in my masters paper. So, I was trying to create a dual axis plot or have the y axis split into two. Does anyone know of a way I could do this?
library(tidyverse)
library(reshape2)
dat <- read_xlsx("ReefPA.xlsx")
dat1 <- dat
dat1$Date <- format(dat1$Date, "%Y/%m")
plot_dat <- dat1 %>%
group_by(Date) %>%
summarise(Sharksucker_Remora = sum(Sharksucker_Remora)) %>%
melt("Date") %>%
filter(Date > '2018-01-01') %>%
arrange(Date)
names(plot_dat) <- c("Date", "Species", "Count")
ggplot(data = plot_dat) +
geom_line(mapping = aes(x = Date, y = Count, group = Species, colour = Species)) +
stat_smooth(method=lm, aes(x = Date, y = Count, group = Species, colour = Species)) +
scale_colour_manual(values=c(Golden_Trevally="goldenrod2", Red_Snapper="firebrick2", Sharksucker_Remora="darkolivegreen3", Juvenile_Remora="aquamarine2")) +
xlab("Date") +
ylab("Total Presence Per Month") +
theme(legend.title = element_blank()) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
The thing is, the problem you're trying to solve doesn't seem like a 2nd Y axis issue. The problem here is of relative scale of the species. You might want to think of something like standardizing the initial species presence to 100 and showing growth or decline from there.
Another option would be faceting by species.

Plotting in r by date range

I have a dataset with 4000 categoric variables which are city names arranged by date. I can do a plot of the entire dataset with an overall count.
What I need to do is be able to plot aggregates of specific cities by specific date ranges. I cannot use by quarter or anything like that because the required date ranges every year are different. I need to be able to, say, select 2016/4/1 to 2016/6/23 to get a count of how many are Denver.
How can I do this?
library(ggplot2)
library(ggpubr)
theme_set(theme_classic())
df <- log %>%
group_by(Location) %>%
summarise(counts = n())
df
ggplot(df, aes(x = Location, y = counts)) +
geom_bar(fill = "#0073C2FF", stat = "identity",width = .65) +
geom_text(aes(label = counts), vjust = -0.3) +
theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
labs(title="Locations of Library Instruction",
subtitle="2016-2020")

How to adjust distances between years on x-axis and adjust line of geom_line(), in ggplot, R-studio?

I would like to create a line plot using ggplot's geom_line() where all distances between years are equal independent of the actual value the year-variable takes and where the dots of geom_point() are connected if there are only two years in between but not if the temporal distance is more than that.
Example:
my.data<-data.frame(
year=c(2001,2003,2005,NA,NA,NA,NA,NA,NA,2019),
value=c(runif(10)))
As for the plot I have tried two different things, both of which are not ideal:
Plotting year as continuous variable with breaks=year and minor_breaks=F, where, obviously the distances between the first three observations are much smaller than the distance between 2005 and 2019, and where, unfortunately, all dots are connected:
library(ggplot2)
library(dplyr)
my.data %>%
ggplot(aes(x=year,y=value)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks=c(2001,2003,2005,2019), minor_breaks=F) +
theme_minimal()
Removing NAs and plotting year as factor which yields equal spacing between the years, but obviously removes the lines between data points:
my.data %>%
filter(!is.na(year)) %>%
ggplot(aes(x=factor(year),y=value)) +
geom_line() +
geom_point() +
theme_minimal()
Are there any solutions to these issues? What am I overlooking?
First attempt:
Second attempt:
What I need (but ideally without the help of Paint):
my.data %>%
ggplot(aes(x=year)) +
geom_line(aes(y = ifelse(year <= 2005,value,NA))) +
geom_point(aes(y = value)) +
scale_x_continuous(breaks=c(2001,2003,2005,2019), minor_breaks=F) +
theme_minimal()
maybe something like this would work
I came to a bit convoluted and not super clean solution, but it might get the job done. I am checking if one year should be connected to the next one with lead(). And "remove" the appropriate connections by turning them white. The dummy column is there to put all years in one line and not two.
my.data = data.frame(year=c(2001,2003,2005,2008,2009,2012,2015,2016,NA,2019),
value=c(runif(10))) %>%
filter(!is.na(year)) %>%
mutate(grouped = if_else(lead(year) - year <= 2, "yes", "no")) %>%
fill(grouped, .direction = "down") %>%
mutate(dummy = "all")
my.data %>%
ggplot(aes(x = factor(year),y = value)) +
geom_line(aes(y = value, group = dummy, color = grouped), show.legend = FALSE) +
geom_point() +
scale_color_manual(values = c("yes" = "black", "no" = "white")) +
theme_classic()

How to change line properties in ggplot2 halfway in a time series?

Take the following straightforward plot of two time series from the economics{ggplot2} dataset
require(dplyr)
require(ggplot2)
require(lubridate)
require(tidyr)
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
group_by(indicator, Y2K) %>%
ggplot(aes(date, percentage, group = indicator, colour = indicator)) + geom_line(size=1)
I would like to change the linetype from "solid" to "dashed" (and possibly also the line size) for all points in the 21st century, i.e. for those observations for which Y2K equals TRUE.
I did a group_by(indicator, Y2K) but inside the ggplot command it appears I cannot use group = on multiple levels, so the line properties only differ by indicator now.
Question: How can I achieve this segmented line appearance?
UPDATE: my preferred solution is a slight tweak from the one by #sahoang:
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
ggplot(aes(date, percentage, colour = indicator)) +
geom_line(size=1, aes(linetype = year(date) >= 2000)) +
scale_linetype(guide = F)
This eliminates the group_by as commented by #Roland, and the filter steps make sure that the time series will be connected at the Y2K point (in case the data would be year based, there could be a visual discontinuity otherwise).
Even easier than #Roland's suggestion:
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
group_by(indicator, Y2K) -> econ
ggplot(econ, aes(date, percentage, group = indicator, colour = indicator)) +
geom_line(data = filter(econ, !Y2K), size=1, linetype = "solid") +
geom_line(data = filter(econ, Y2K), size=1, linetype = "dashed")
P.S. Alter plot width to remove spike artifacts (red line).
require(dplyr)
require(ggplot2)
require(lubridate)
require(tidyr)
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
ggplot(aes(date, percentage, colour = indicator)) +
geom_line(size=1, aes(linetype = Y2K))

Resources