I´d like to add error bars to my barplot. The data for standard deviations is in two different columns.
This is my barplot:
# load packages
> library(data.table)
> library(ggplot2)
> library(tidyr)
>
> results <- fread("Results.csv", header=TRUE, sep=";")
>
> str(results)
Classes ‘data.table’ and 'data.frame': 7 obs. of 5 variables:
$ Organism : chr "AC1432" "D3425" "BF3523" "XR2405" ...
$ Molecule1 : num 39.5 418.4 189.2 49.3 4610.9 ...
$ Molecule1sd: num 19.6 70.9 102.8 21.2 275.9 ...
$ Molecule2 : num 276 6511 235 500 11205 ...
$ Molecule2sd: num 21 291.1 109.7 67.1 94.5 ...
- attr(*, ".internal.selfref")=<externalptr>
>
> df <- data.frame(results)
>
> str(df)
'data.frame': 7 obs. of 5 variables:
$ Organism : chr "AC1432" "D3425" "BF3523" "XR2405" ...
$ Molecule1 : num 39.5 418.4 189.2 49.3 4610.9 ...
$ Molecule1sd: num 19.6 70.9 102.8 21.2 275.9 ...
$ Molecule2 : num 276 6511 235 500 11205 ...
$ Molecule2sd: num 21 291.1 109.7 67.1 94.5 ...
>
>
>
> # Manually set factor levels of 'Organism' column to plot in a logical order.
> df$Organism = factor(df$Organism,
+ levels=c("without organism", "AC1432", "BF3523", "XR2405", "D3425", "XR2463", "ATF259"))
>
> df.g <- gather(df, Molecule1, Molecule2, -Organism, -Molecule1sd, -Molecule2sd)
> df.sd <- gather(df, Molecule1sd, Molecule2sd, -Molecule1, -Molecule2, -Organism)
> ggplot(df.g, aes(Molecule1, Molecule2)) +
+ geom_bar(aes(fill = Organism), stat = "identity", position = "dodge")
barplot without error bar
used data:
> dput(df)
structure(list(Organism = structure(c(2L, 5L, 3L, 4L, 6L, 7L,
1L), .Label = c("without organism", "AC1432", "BF3523", "XR2405",
"D3425", "XR2463", "ATF259"), class = "factor"), Molecule1 = c(39.45920899,
418.4234805, 189.162295, 49.314698, 4610.921188, 751.7070352,
35), Molecule1sd = c(19.55450482, 70.91013667, 102.7566193, 21.20841393,
275.8934527, 71.62450643, NA), Molecule2 = c(275.9147606, 6510.974605,
235.247381, 499.8928585, 11205.33907, 9507.869294, 250), Molecule2sd = c(21.04668977,
291.1223384, 109.652064, 67.1000078, 94.54544271, 707.1950335,
NA)), row.names = c(NA, -7L), class = "data.frame")
and this is my trial for the error bars
ggplot(df.g, aes(Molecule1, Molecule2)) +
geom_bar(aes(fill = Organism), stat = "identity", position = "dodge") +geom_errorbar(df.sd, aes_Molecule1(ymin=Molecule1-Molecule1sd, ymax=Molecule1+Molecule1sd),aes_Molecule2(ymin=Molecule2-Molecule2sd, ymax=Molecule2+Molecule2sd), width=.2 )
but my idea doesn´t work. How can I add error bars from two different columns?
It might be easier if you reshape your dataset with columns for Organism, Molecule, mean and sd. Here is a tidyverse way to do it:
Package and Dataset
library(tidyverse)
df <- data.frame(Organism = c("AC1432", "D3425", "BF3523", "XR2405",
"XR2463", "ATF259", "without organism"),
Molecule1 = c(39.5, 418.4, 189.2, 49.3,
4610.9, 800, 10),
Molecule1sd = c(19.6, 70.9, 102.8, 21.2,
275.9, 100, 1),
Molecule2 = c(276, 6511, 235, 500,
11205, 9500, 250),
Molecule2sd = c( 21, 291.1, 109.7, 67.1,
94.5, 50, 2))
# I estimated the not shown values in your str(result)
Reshaping
df2 <- df %>%
# add meaningful ending to columnnames containing mean (m)
select(Molecule1m = Molecule1,
Molecule2m = Molecule2,
everything()) %>%
# gather whole dataset into Molecule, mean, sd
pivot_longer(cols = -Organism,
names_to = c("Molecule", ".value"),
names_pattern = "(Molecule[12])(.)") %>%
# factor reorder levels
mutate(Organism = factor(Organism,
levels=c("without organism", "AC1432",
"BF3523", "XR2405",
"D3425", "XR2463", "ATF259")))
Plot
ggplot(df2, aes(x = Molecule,
y = m,
fill = Organism)) +
geom_col(position = "dodge") +
geom_errorbar(aes(ymin = m - s, ymax = m + s),
position = "dodge")
Related
Question
I am trying to combine three ggplot geom_bar into one geom_bar plot utilising dodge so I can visually compare data across two categorical and one numeric variables. What am I doing wrong?
Individual graphs work
Each graph works on it's own (with formatting issues) and I've been following answers on SO like How to overlay two geom_bar? but I'm not understanding what's needed to be done.
ONE <- ggplot(Ireland, aes(TargetGroup, FirstDosePC))+geom_bar(stat = 'identity',width = 0.8, fill = "green") +
facet_grid(.~Vaccine) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title="1st Dose Ireland by Group & Vaccine Type",
caption = "(ECDC, 2021)",
x="Target Groups over 18",
y="First Dose Administered")
TWO <- ggplot(Italy, aes(TargetGroup, FirstDosePC))+ geom_bar(stat = 'identity',width = 0.8, fill = "blue") +
facet_grid(.~Vaccine) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title="1st Dose Italy by Group & Vaccine Type",
caption = "(ECDC, 2021)",
x="Target Groups over 18",
y="First Dose Administered")
THREE <- ggplot(Latvia, aes(TargetGroup, FirstDosePC))+geom_bar(stat = 'identity',width = 0.8, fill = "red") +
facet_grid(.~Vaccine) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title="1st Dose Latvia by Group & Vaccine Type",
caption = "(ECDC, 2021)",
x="Target Groups over 18",
y="First Dose Administered")
An example of failed code
My coding attempts look close to this but it seems to fail - I don't understand why. I am hoping to learn how to add three graphs together with labels and to use dodge
OneTwo <- ONE + geom_bar(FDPercent=Italy, aes(TargetGroup, FirstDose))+ geom_bar(stat = 'identity',width = 0.8, fill = "blue") +
facet_grid(.~Vaccine) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title="1st Dose Italy by Group & Vaccine Type",
caption = "(ECDC, 2021)",
x="Target Groups over 18",
y="First Dose Administered")
My individual graphs look like this
The graph type I'm aiming for
and what I am aiming for is something like this but breaking it out by vaccine type to stretch my learning, etc (source https://towardsdatascience.com/track-covid-19-data-yourself-with-r-eb3e641cd4b3)
My raw data comes from
data <- read.csv("https://opendata.ecdc.europa.eu/covid19/vaccine_tracker/csv/data.csv", na.strings = "", fileEncoding = "UTF-8-BOM")
and is manipulated to test out R functions that has left me with a dataframe called FDPercent with a numeric column called FirstDosePC (Percentage of 1st Dose per country population) that is linked to Country with 30 EU countries (ISO 3166-1-alpha-2 categorical data) and 10 TargetGroup types (categorical) in the data frame.
> dput(head(FDPercent,3))
structure(list(Country = c("AT", "AT", "AT"), NumberDosesReceived = c(0L,
0L, 61425L), NumberDosesExported = c(0L, 0L, 0L), FirstDose = c(0L,
0L, 87L), FirstDoseRefused = c(NA_integer_, NA_integer_, NA_integer_
), SecondDose = c(0L, 0L, 0L), UnknownDose = c(0L, 0L, 0L), TargetGroup = c("Age18_24",
"Age18_24", "Age18_24"), Vaccine = c("UNK", "AZ", "COM"), Population = c(8901064L,
8901064L, 8901064L), Date = structure(c(18624, 18624, 18624), class = "Date"),
FirstDosePC = c("0.0000", "0.0000", "0.0010")), row.names = 21:23, class = "data.frame")
> str(FDPercent)
'data.frame': 116532 obs. of 12 variables:
$ Country : chr "AT" "AT" "AT" "AT" ...
$ NumberDosesReceived: int 0 0 61425 0 0 0 61425 0 0 0 ...
$ NumberDosesExported: int 0 0 0 0 0 0 0 0 0 0 ...
$ FirstDose : int 0 0 87 0 0 0 1299 0 0 0 ...
$ FirstDoseRefused : int NA NA NA NA NA NA NA NA NA NA ...
$ SecondDose : int 0 0 0 0 0 0 0 0 0 0 ...
$ UnknownDose : int 0 0 0 0 0 0 0 0 0 0 ...
$ TargetGroup : chr "Age18_24" "Age18_24" "Age18_24" "Age18_24" ...
$ Vaccine : chr "UNK" "AZ" "COM" "MOD" ...
$ Population : int 8901064 8901064 8901064 8901064 8901064 8901064 8901064 8901064 8901064 8901064 ...
$ Date : Date, format: "2020-12-28" "2020-12-28" "2020-12-28" "2020-12-28" ...
$ FirstDosePC : chr "0.0000" "0.0000" "0.0010" "0.0000" ...
With help from #kat in the comments - changed from geom_bar() to geom_col() and dropped the third variable
Ireland <- subset(FDPercent, Country == "IE") #contructed a subset by country
Italy <- subset(FDPercent, Country == "IT")
Latvia <- subset(FDPercent, Country == "LV")
ONE1 <- ggplot(Italy, aes(Date, as.numeric(FirstDosePC))) +
geom_col(fill = "red", alpha = 1, width = 7) + theme_minimal(base_size = 8) +
xlab(NULL) + ylab(NULL) + scale_x_date(date_labels = "%Y/%m/%d") #reduced the theme formating
OneTwo <- ONE1 + geom_col(data=Ireland, aes(Date,
as.numeric(FirstDosePC)),
fill="Green", alpha = 1,width = 5)
OneTwoThree <- OneTwo + geom_col(data=Latvia, aes(Date,
as.numeric(FirstDosePC)),
fill="black", alpha = 1, width = 2)
OneTwoThree + labs(title="Ireland, Italy, & Latvia - First Dose comparision", #added labels to the data
subtitle="Using First Dose Delivered per day as a percentage of population",
caption = "(ECDC, 2021)",
x="Date Administered",
y="% Population treated")
I am trying to make a shiny app where you can select different miRNA in my input then plot the survival curve using ggsurvplot. There is something wrong with the functions within fitSurv, but I am not sure where I am doing it wrong.
library(dplyr)
require(survminer)
library(tidyverse)
require(reshape2)
library(shiny)
library(tidyr)
require(survival)
example data:
df.miRNA.cpm <- structure(list(`86` = c(5.57979757386892, 17.0240095264258, 4.28380151026145,
13.0457611762755, 12.5531123449841), `175` = c(5.21619202802748,
15.2849097474841, 2.46719979911461, 10.879496005461, 9.66416497290915
), `217` = c(5.42796072966512, 17.1413407297933, 5.15230233060323,
12.2646127361351, 12.1031024927547), `394` = c(-1.1390337316217,
15.1021660424984, 4.63168157763046, 11.1299079134792, 9.55572588729967
), `444` = c(5.06134249676025, 14.5442494311861, -0.399445049232868,
7.45775961504073, 9.92629675808998)), row.names = c("hsa_let_7a_3p",
"hsa_let_7a_5p", "hsa_let_7b_3p", "hsa_let_7b_5p", "hsa_let_7c_5p"
), class = "data.frame")
df.miRNA.cpm$miRNA <- rownames(df.miRNA.cpm)
ss.survival.shiny.miRNA.miRNA <- structure(list(ID = c("86", "175", "217", "394", "444"), TimeDiff = c(71.0416666666667,
601.958333333333, 1130, 1393, 117.041666666667), Status = c(1L,
1L, 0L, 0L, 1L)), row.names = c(NA, 5L), class = "data.frame")
Joint the two example data frames:
data_prep.miRNA <- df.miRNA.cpm %>%
tidyr::pivot_longer(-miRNA, names_to = "ID") %>%
left_join(ss.survival.shiny.miRNA.miRNA)
Example of the joined data:
> data_prep.miRNA
# A tibble: 153,033 x 5
miRNA ID value TimeDiff Status
<chr> <chr> <dbl> <dbl> <int>
1 hsa_let_7a_3p 86 5.58 71.0 1
2 hsa_let_7a_3p 175 5.22 602. 1
3 hsa_let_7a_3p 217 5.43 1130 0
4 hsa_let_7a_3p 394 -1.14 1393 0
5 hsa_let_7a_3p 444 5.06 117. 1
6 hsa_let_7a_3p 618 4.37 1508 0
7 hsa_let_7a_3p 640 2.46 1409 0
8 hsa_let_7a_3p 829 0.435 919. 0
9 hsa_let_7a_3p 851 -1.36 976. 0
10 hsa_let_7a_3p 998 3.87 1196. 0
# … with 153,023 more rows
For a selected MicroRNA this works:
fitSurv <- survfit(Surv(data$TimeDiff, data$Status) ~ paste(cut(value , quantile(value , probs = c(0, 0.8)), include.lowest=T)), data = data_prep.miRNA[grep("hsa_let_7a_3p",data_prep.miRNA$miRNA),])
Shiny:
ui.miRNA <- fluidPage(
selectInput("MicroRNA", "miRNA", choices = unique(data_prep.miRNA$miRNA)),
plotOutput("myplot"))
server <- function(input, output, session) {
data_selected <- reactive({
filter(data_prep.miRNA, miRNA %in% input$MicroRNA)
})
output$myplot <- renderPlot({
fitSurv <- survfit(Surv("TimeDiff", "Status") ~ paste(cut("value" , quantile("value" , probs = c(0, 0.8)), include.lowest=T)), data = data_selected)
ggsurvplot(fitSurv ,title="", xlab="Time (Yrs)", ylab="Survival prbability",
font.main = 8,
font.x = 8,
font.y = 8,
font.tickslab = 8,
font.legend=8,
pval.size = 3,
pval.coord = c(1000,1),
size=0.4,
legend = "right",
censor.size=2,
break.time.by = 365,
pval =T,#"p=0.003",#"p=0.41",
#xscale=365,
#palette = c("#E7B800", "#2E9FDF"),
#ggtheme = theme_bw(),
risk.table = F,
xscale=365.25,
xlim=c(0,7*365))
})
}
shinyApp(ui.miRNA, server)
There are several mistakes in this statement:
fitSurv <-
survfit(Surv("TimeDiff", "Status") ~ paste(cut("value", quantile("value", probs = c(0, 0.8)), include.lowest=T)),
data = data_selected)
First, data_selected is a reactive conductor, not a dataframe. If you want the dataframe returned by this reactive conductor, you have to use parentheses: data_selected().
Next, you must not quote the variables: TimeDiff and not "TimeDiff", etc.
The paste command is useless.
Your cut produces only one category and the NA category. To get two intervals as categories, use probs = c(0, 0.8, 1) in quantile.
Finally it is not a good idea to use T for TRUE, because T can be set to any R object, while TRUE is a reserved work.
To conclude, here is the corrected code:
fitSurv <-
survfit(Surv(TimeDiff, Status) ~ cut(value, quantile(value, probs = c(0, 0.8, 1)), include.lowest=TRUE),
data = data_selected())
I would like to plot a graph with different lines for each row, and that the column names are assigned to the X axis. For finishing, I would also like to make every line different from the other with a legend for the reader.
Thank you in advance.
My data:
Average 2003-2005 Average 2006-2008 Average 2009-2010 Average 2011-2013 Average 2014-2016
31.48489 32.53664 30.41938 30.53870 31.15550
18.78799 17.78141 17.58791 17.03071 17.25654
107.46615 107.71512 109.55090 110.31438 109.66492
> str(Table_1_2003_2018_All)
'data.frame': 3 obs. of 6 variables:
$ Average 2003-2005: num 31.5 18.8 107.5
$ Average 2006-2008: num 32.5 17.8 107.7
$ Average 2009-2010: num 30.4 17.6 109.6
$ Average 2011-2013: num 30.5 17 110.3
$ Average 2014-2016: num 31.2 17.3 109.7
$ Average 2017-2018: num 31.8 16.8 109.8
Code:
# Plot 1
colnames(Table_1_2003_2018_All) <- c("2003-2005","2006-2008","2009-2010","2011-2013","2014-2016","2017-2018")
plot(seq_along(Table_1_2003_2018_All),
Table_1_2003_2018_All[1,], type="l", xaxt = 'n',xlab = 'Time Periods', ylab = 'Average',
main = "MARKET WORK", ylim = c(30,35)
)
axis(1, at = 1:6, colnames(Table_1_2003_2018_All))
Thanks in advance.
We can specify the 'x' as numeric i.e sequence of columns and then change the x labels with axis
plot(seq_along(Table_1_2003_2018_All),
Table_1_2003_2018_All[1,], type="l", xaxt = 'n',
xlab = 'colnames', ylab = 'first row')
axis(1, at = 1:5, colnames(Table_1_2003_2018_All))
If we need to plot lines for each row, use matplot
matplot(t(Table_1_2003_2018_All), type = 'l', xaxt = 'n')
legend("top", legend = seq_len(nrow(Table_1_2003_2018_All)),
col= seq_len(nrow(Table_1_2003_2018_All)),cex=0.8,
fill=seq_len(nrow(Table_1_2003_2018_All)))
axis(1, at = 1:5, colnames(Table_1_2003_2018_All))
data
Table_1_2003_2018_All <- structure(list(Average2003.2005 = c(31.48489, 18.78799, 107.46615
), Average2006.2008 = c(32.53664, 17.78141, 107.71512), Average2009.2010 = c(30.41938,
17.58791, 109.5509), Average2011.2013 = c(30.5387, 17.03071,
110.31438), Average2014.2016 = c(31.1555, 17.25654, 109.66492
)), class = "data.frame", row.names = c(NA, -3L))
I have two data.frames with columns that contain accession numbers
subset of df 1:
sub_df1 <- structure(list(database = "CLO, ArrayExpress, ArrayExpress, ATCC, BCRJ, BioSample, CCLE, ChEMBL-Cells, ChEMBL-Targets, Cosmic, Cosmic, Cosmic, Cosmic-CLP, GDSC, GEO, GEO, GEO, IGRhCellID, LINCS_LDP, Wikidata",
database_accession = "CLO_0009006, E-MTAB-2770, E-MTAB-3610, CRL-7724, 0337, SAMN03471142, SH4_SKIN, CHEMBL3308177, CHEMBL2366309, 687440, 909713, 2159447, 909713, 909713, GSM887568, GSM888651, GSM1670420, SH4, LCL-1280, Q54953204"), .Names = c("database",
"database_accession"), row.names = 2L, class = "data.frame")
subset of df 2:
sub_df2 <- structure(list(database_accession = "SH4_SKIN", G1 = -1.907138,
G2 = -7.617305, G3 = -3.750553, G4 = 2.615004, G5 = 9.751557), .Names = c("database_accession",
"G1", "G2", "G3", "G4", "G5"), row.names = 101L, class = "data.frame")
I would like to merge the two dataframes by the column database_accession but the problem is they are not exact matches. the string insub_df2 is a substring of the string in sub_df1.
I thought about using fuzzyjoin but having a hard time getting the match algorithm right.
The fuzzyjoin solution, using match_fun = str_detect or regex_join():
library(tidyverse); library(fuzzyjoin)
# Load data
sub_df1 <- structure(list(database = "CLO, ArrayExpress, ArrayExpress, ATCC, BCRJ, BioSample, CCLE, ChEMBL-Cells, ChEMBL-Targets, Cosmic, Cosmic, Cosmic, Cosmic-CLP, GDSC, GEO, GEO, GEO, IGRhCellID, LINCS_LDP, Wikidata", database_accession = "CLO_0009006, E-MTAB-2770, E-MTAB-3610, CRL-7724, 0337, SAMN03471142, SH4_SKIN, CHEMBL3308177, CHEMBL2366309, 687440, 909713, 2159447, 909713, 909713, GSM887568, GSM888651, GSM1670420, SH4, LCL-1280, Q54953204"), .Names = c("database", "database_accession"), row.names = 2L, class = "data.frame")
sub_df2 <- structure(list(database_accession = "SH4_SKIN", G1 = -1.907138, G2 = -7.617305, G3 = -3.750553, G4 = 2.615004, G5 = 9.751557), .Names = c("database_accession", "G1", "G2", "G3", "G4", "G5"), row.names = 101L, class = "data.frame")
# Solution
# Using fuzzy_join. Could also use regex_full_join(), which is the wrapper for match_fun = str_detect, mode = "full"
fuzzy_join(sub_df1, sub_df2, match_fun = str_detect, by = "database_accession", mode = "full") %>%
str()
#> 'data.frame': 1 obs. of 8 variables:
#> $ database : chr "CLO, ArrayExpress, ArrayExpress, ATCC, BCRJ, BioSample, CCLE, ChEMBL-Cells, ChEMBL-Targets, Cosmic, Cosmic, Cos"| __truncated__
#> $ database_accession.x: chr "CLO_0009006, E-MTAB-2770, E-MTAB-3610, CRL-7724, 0337, SAMN03471142, SH4_SKIN, CHEMBL3308177, CHEMBL2366309, 68"| __truncated__
#> $ database_accession.y: chr "SH4_SKIN"
#> $ G1 : num -1.91
#> $ G2 : num -7.62
#> $ G3 : num -3.75
#> $ G4 : num 2.62
#> $ G5 : num 9.75
Created on 2019-03-17 by the reprex package (v0.2.1)
You can use the sqldf package and write a query joining the tables with a like condition to test if the value in sub_df1 contains the value in sub_df2.
library(sqldf)
sqldf('
select *
from sub_df2 two
left join sub_df1 one
on one.database_accession like "%" || two.database_accession || "%"
')
I have a table with header expanded on two columns. How to draw a 3D graph on this table OR what would be a way to draw a graph on tables having elaborated headers. Kindly suggest me alternate ways to achieve this (if any)
Crime Table:
year
2014 2015 2016
Reported Detected Reported Detected Reported Detected
Murder 221 208 178 172 26 20
Murder(Gain) 20 16 11 9 1 1
Dacoity 51 45 44 36 5 1
Robbery 538 316 351 201 23 10
Chain Snatching 528 394 342 229 23 0
Code:
library(tables)
#CLASS 1 CRIMES 2014
c14 <- structure(list(`Reported` = c(221, 20, 51,
538, 528), `Detected` = c(208, 16, 45, 316, 394)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity", "Robbery", "Chain Snatching"), class = "data.frame")
c14
#CLASS 1 CRIMES 2015
c15 <- structure(list(`Reported` = c(178, 11, 44,
351, 342), `Detected` = c(172, 9,
36, 201, 229)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c15
#CLASS 1 CRIMES 31-01-2016
c16 <- structure(list(`Reported` = c(26, 1, 5,
23, 23), `Detected` = c(20, 1,
1, 10, 0)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c16
# rbind with rownames as a column
st <- rbind(
data.frame(c14, year = '2014', what = factor(rownames(c14), levels = rownames(c14)),
row.names= NULL, check.names = FALSE),
data.frame(c15,year = '2015',what = factor(rownames(c15), levels = rownames(c15)),
row.names = NULL,check.names = FALSE),
data.frame(c16,year = '2016',what = factor(rownames(c16), levels = rownames(c16)),
row.names = NULL,check.names = FALSE)
)
crimetable <- tabular(Heading()*what ~ year*(`Reported` +`Detected`)*Heading()*(identity),data=st)
crimetable
As I hate 3D plots for 3-way tables and I like ggplot2, I suggest this:
Gather your data into "long" format:
library(tidyr)
st_long = gather(st, type, count, -c(year, what))
head(st_long, 3)
# year what type count
# 1 2014 Murder Reported 221
# 2 2014 Murder(Gain) Reported 20
# 3 2014 Dacoity Reported 51
As you can see, both Detected and Reported columns are now included in the same column called type. This is useful for ggplot2, as it can easily create facets. Facets are separate elements within the plot that share the same aesthetic components but work with on different groups of data:
library(ggplot2)
ggplot(st_long, aes(year, count, group = what, color = what)) +
geom_line() +
facet_wrap(~ type)
(I am not saying that line plot is the only/best plot here, but it is often used when comparing frequencies across different time-points.)