Multiple Timeseries graph in R - r

I am trying to create a time series plot that has multiple data over the years. I would like to just plot the years and get the data to run from start date to end date. Here I have converted the respective columns to dates and then combined them but I do not get the result I am looking for.
The data is available from this website: https://www.businessinsider.co.za/coronavirus-deaths-how-pandemic-compares-to-other-deadly-outbreaks-2020-4?r=US&IR=T
Something like this where the data doesn't start in the same year or end in the same year:
https://ichef.bbci.co.uk/news/410/cpsprodpb/6E25/production/_111779182_optimised-mortality-nc.png
(time period vs deaths caused)
library(lubridate)
library(ggplot2)
otherDiseaseData <- structure(list(ï..Disease = structure(c(11L, 2L, 12L, 6L, 3L,
1L, 9L, 7L, 13L, 4L, 5L, 8L, 10L), .Label = c("Asian Flu", "blackdeath",
"Cholera", "Covid 19", "Ebola", "HIV", "Hong Kong Flu", "Mers",
"Russian Flu", "Sars", "smallpox", "spanish flu", "Swine Flu"
), class = "factor"), Start = c(0L, 1347L, 1918L, 1981L, 1899L,
1957L, 1889L, 1968L, 2009L, 2019L, 2014L, 2012L, 2002L), End = c(1979L,
1351L, 1919L, 2020L, 1923L, 1958L, 1890L, 1970L, 2010L, 2020L,
2016L, 2020L, 2003L), Death = c(300000L, 225000000L, 50000L,
2360000L, 1500000L, 1100000L, 1000000L, 1000000L, 151700L, 101526L,
11300L, 866L, 774L)), class = "data.frame", row.names = c(NA,
-13L))
yrs <- otherDiseaseData$Start
yr <- as.Date(as.character(yrs), format = "%Y")
yStart <- year(yr)
yrs <- otherDiseaseData$End
yr <- as.Date(as.character(yrs), format = "%Y")
yStart <- year(yr)
otherDiseaseData$x <- paste(otherDiseaseData$Start,otherDiseaseData$End)
otherDiseaseData
ggplot(otherDiseaseData, aes(y = Death, x = otherDiseaseData$x),xlim=0000-2000) + geom_point()

I'm not sure I've fully understood what you're asking for, but my interpretation is this:
df <- reshape::melt(otherDiseaseData, measure.vars = c("Start", "End"))
ggplot(df %>% filter(Disease != "smallpox", Death != 225000000)) +
geom_line(aes(value,Death, colour = Disease), size = 2) +
theme_minimal() +
ggrepel::geom_label_repel(data = filter(df, Disease != "smallpox", Death != 225000000, variable != "Start"),
aes(label = Disease, x = value, y = Death)) +
scale_y_log10() +
theme(legend.position = "none", aspect.ratio = 1) +
ylab("Number of Deaths") + xlab("Year")
I've used the reshape package to reorganise the given data, and then ggrepel to label the bars. I've had to remove some data as it really throws the scale, which I've ended up making logarithmic to spread the data out a little. It gives you this plot:
It's not perfect but it might be heading in the right direction? Apologies if I've misunderstood what you were angling for.

Related

change linetype when using dfmelt with breaks

To plot the time series of one month over multiple years I'm using the following code:
JAN<-subset(nDF, format.Date(DATE, "%m")=="01")
dfmelt<-melt(JAN,id.vars="DATE")
breaks <- unique(as.Date(cut(dfmelt$DATE, "month")))
ba2 <- transform(dfmelt, year = as.integer(format(DATE, "%Y")))
p <- ggplot(ba2, aes(x=DATE,y=value,
col=variable)) + labs(title='JANUARY')+
geom_line(lwd=1.0,alpha=0.5) +
facet_grid(cols = vars(year), scales = "free_x", space = "free_x")+
theme(panel.spacing = unit(0, "lines"))
p + scale_x_date(breaks = breaks, date_labels = "%b")
head(JAN)
DATE MODEL BC OBSERVED
215 2001-01-01 1.2860092 1.52571356 1.55332905
216 2001-01-02 0.7906073 1.24322433 1.24701969
217 2001-01-03 0.3687850 0.11566294 0.11677768
218 2001-01-04 0.3539595 0.15826654 0.15906525
219 2001-01-05 0.2531596 0.18768851 0.18768533
220 2001-01-06 0.2311364 0.01537928 0.01516614
However since BC and Observed have almost same values, I would like to change linetype of MODEL and OBSERVED only . How do I achieve this as any change I do reflects in all three lines
Add linetype= to your aesthetics. Perhaps:
p <- ggplot(ba2, aes(x=DATE, y=value, color=variable, linetype=variable)) +
labs(title='JANUARY') +
geom_line(lwd=1.0, alpha=0.5) +
facet_grid(cols=vars(year), scales="free_x", space="free_x") +
theme(panel.spacing=unit(0, "lines"))
p
Data
ba2 <- structure(list(DATE = structure(c(11323, 11324, 11325, 11326, 11327, 11328, 11323, 11324, 11325, 11326, 11327, 11328, 11323, 11324, 11325, 11326, 11327, 11328), class = "Date"), variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("MODEL", "BC", "OBSERVED"), class = "factor"), value = c(1.2860092, 0.7906073, 0.368785, 0.3539595, 0.2531596, 0.2311364, 1.52571356, 1.24322433, 0.11566294, 0.15826654, 0.18768851, 0.01537928, 1.55332905, 1.24701969, 0.11677768, 0.15906525, 0.18768533, 0.01516614), year = c(2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L)), class = "data.frame", row.names = c(NA, -18L))

scale_color_gradient for geom_curve in ggplot

Currently I have a dataframe consisting of several flight as such
Using ggplot as shown below, I have managed to plot the flight path from origin to destination however cannot seem to change the path line to gradient colour that can visualise the flight from origin to destination.
Do advise as from my understanding, ggplot colour must be reliant on a variable.
q4c%>%
mutate(TailNum = factor(x=TailNum, levels=c("N351UA","N960DL","N524", "N14998", "N355CA","N711UW", "N587AA", "N839UA","N941CA","N516UA"))) %>%
ggplot() + usMap2 +
geom_curve(aes(x=OriginLong, y=OriginLat, xend=DestLong, yend=DestLat, size=TotalDelay,color=TailNum),
curvature=0.2)+
scale_size_continuous(range = c(0.02, 0.5))+
geom_point(aes(x=OriginLong, y=OriginLat),
size=0.02) +
geom_point(aes(x=DestLong, y=DestLat),
size=0.02) +
facet_wrap(~TailNum)
edit:
I have tried ggforce::geom_link however it only shows solid colors instead of gradient as i added dummy sequence of 0,1 to get the color contrast
structure(list(Year = c(2005L, 2005L, 2005L, 2005L, 2005L, 2005L
), Month = c(1L, 1L, 1L, 1L, 1L, 1L), DayofMonth = c(1L, 1L,
1L, 1L, 1L, 1L), DepTime = c(1022L, 1025L, 1037L, 1054L, 1110L,
1111L), ArrTime = c(1527L, 1057L, 1209L, 1219L, 1454L, 1409L),
DepDelay = c(3L, 0L, -10L, -1L, 65L, -2L), ArrDelay = c(6L,
-3L, -11L, -27L, 52L, -8L), TotalDelay = c(9L, -3L, -21L,
-28L, 117L, -10L), TailNum = c("N351UA", "N524", "N14998",
"N941CA", "N355CA", "N587AA"), Origin = c("DEN", "PHX", "LBB",
"LGA", "SLC", "DFW"), Dest = c("CLT", "BUR", "IAH", "GSO",
"STL", "IND"), AirportOrigin = c("Denver Intl", "Phoenix Sky Harbor International",
"Lubbock International", "LaGuardia", "Salt Lake City Intl",
"Dallas-Fort Worth International"), OriginLong = c(-104.6670019,
-112.0080556, -101.8227778, -73.87260917, -111.9777731, -97.0372
), OriginLat = c(39.85840806, 33.43416667, 33.66363889, 40.77724306,
40.78838778, 32.89595056), AirportDest = c("Charlotte/Douglas International",
"Burbank-Glendale-Pasadena", "George Bush Intercontinental",
"Piedmont Triad International", "Lambert-St Louis International",
"Indianapolis International"), DestLong = c(-80.94312583,
-118.3584969, -95.33972222, -79.9372975, -90.35998972, -86.29438417
), DestLat = c(35.21401111, 34.20061917, 29.98047222, 36.09774694,
38.74768694, 39.71732917), id = 1:6, seqnum = c(1, 6, 1,
6, 1, 6)), row.names = c(NA, 6L), class = "data.frame")
dataframe
q4cc%>%
ggplot() + usMap2 +
geom_link2(aes(x=OriginLong, y=OriginLat, size=TotalDelay, colour=seqnum))+
scale_size_continuous(range = c(0.02, 1))+
scale_color_gradient(name="Journey Path", high="red", low="blue")+
scale_alpha_continuous(range=c(0.03,0.3))+
geom_point(aes(x=OriginLong, y=OriginLat),
colour="red",
size=0.02) +
facet_wrap(~TailNum)
New Plot

R - countif to a new column

I am trying to produce some charts of the dummy data at the bottom of this message and have a few questions.
Would it be recommended to generate a new dataframe with summary stats so that the Year column becomes unique and the second column provides the total count or can I work with the data as is?
Related to this, if I do want to create a new dataframe, what is the best way to make it so that it has: Year, TotalCount, Counts per Term, Counts per Society?
My dummyyearcount dataframe has been created using:
dummyyearcount <- count(dummydata, 'Year')
Is there a way to do multiple counts within the one line of code? If so, how?
Regarding the plots, I am looking to plot a cumulative line plot, however when running the code below, it is looking for a y axis value. Is there are a way to make it do a count of the number of publications within that year rather and then split it out by society or term as opposed to me having to output a summary table and feeding in the Total Count as the y-axis?
The code below is what I have for the line plot, which complains with:
"Error: geom_line requires the following missing aesthetics: y"
Also, how can I make this cumulative so in years of no publications it will just flat line?
ggplot() + aes(dummydata$Year, group=dummydata$Term, color=dummydata$Term) + geom_line(show.legend = TRUE) +
theme(axis.ticks=element_line(colour = 'black'), panel.background = element_rect('white'),
panel.grid.major = element_line(colour = 'gray85'), panel.border = element_rect(colour = 'black', fill = FALSE)) +
scale_y_continuous(expand = c(0,0), limits = c(0,5)) + scale_x_continuous(expand = c(0,0))
Output from dput():
structure(list(Year = c(2017L, 2011L, 2012L, 2010L, 2011L, 2015L,
2011L, 2011L, 2012L, 1994L, 2005L, 2009L, 1976L, 2007L, 2014L,
2013L, 2007L), Title = structure(1:17, .Label = c("Title of paper A",
"Title of paper B", "Title of paper C", "Title of paper D", "Title of paper E",
"Title of paper F", "Title of paper G", "Title of paper H", "Title of paper I",
"Title of paper J", "Title of paper K", "Title of paper L", "Title of paper M",
"Title of paper N", "Title of paper O", "Title of paper P", "Title of paper Q"
), class = "factor"), Authors = structure(c(1L, 1L, 2L, 1L, 3L,
4L, 7L, 1L, 8L, 5L, 4L, 6L, 10L, 10L, 9L, 4L, 2L), .Label = c("Bloggs",
"Jones", "Jones and Bloggs", "Smith", "Smith and Jones", "Smith, Jones and Wilson",
"White", "White and Bloggs", "Wilson", "Wilson and Jones"), class = "factor"),
Society = structure(c(4L, 4L, 1L, 1L, 4L, 4L, 2L, 3L, 4L,
1L, 1L, 4L, 4L, 2L, 4L, 4L, 4L), .Label = c("ABC", "MNO",
"N", "XYZ"), class = "factor"), Term = structure(c(1L, 1L,
1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L
), .Label = c("A", "B"), class = "factor")), .Names = c("Year",
"Title", "Authors", "Society", "Term"), class = "data.frame", row.names = c(NA,
-17L))
An example plot of the look I am eventually wanting to achieve:
I am still very new to R so any help would be appreciated.
I like doing it like this using data.table package because it is quite tractable to me (but this is not the only way):
require(data.table)
# Turn data.frame into a data.table with term and year as group identifiers
setDT(dummydata ,key = c("Term","Year"))
# Get number of records in each group
dummydata[ , N := .N , by = .(Year,Term) ]
# Plot
ggplot( dummydata , aes( x = Year , y = cumsum(N) , colour = Term ) ) +
geom_line()
Using count function from plyr package to Count the number of occurrences.
#dummy data
df <- data.frame(Year = sample(1984:2014, 200, replace = TRUE), Title = sample(c("Paper A","Paper B","Paper C","Paper D","Paper E","Paper F","Paper G"), 200, replace = TRUE),Authors = sample(c("Stuart","Jerry","Kevin","Phil","Gru","Nefario","Phil","Josh"),200,replace = TRUE), Society = sample(c("lab1","lab2","lab3","lab4","lab5"),200,replace = TRUE),Term = sample(c("1st","2nd","3rd","4th"),200,replace = TRUE))
#grouping data based on society and year
library(plyr)
df.1 <- count(df, vars = c("Society","Year"))
#plotting the respective line plot
library(ggplot2)
p <- ggplot(df.1,aes(x = Year, y = freq, color = Society, group = Society)) + geom_line() + geom_point() + scale_x_continuous(breaks = df.1$Year)
p
Output Plot :
Additionally, if you want to add Term factor also in graph :
df.2 <- count(df, vars = c("Society","Year","Term"))
p2 <- ggplot(df.2,aes(x = Year, y = freq, color = Society, group = Society, shape = Term)) + geom_line() + geom_point(aes(size = Term)) + scale_x_continuous(breaks = df.2$Year)
p2

Using directlabels::geom_dl when label is the same for two groups

I have a problem with geom_dl() where it is not placing my label correctly because two groups have the same label. I can see that
data$groups <- data$label
inside of the GeomDl call is causing the trouble, but I can't figure out how to fix it.
This is what it currently looks like:
and this is what it should look like:
Here are the data and the ggplot code:
dat <- structure(list(level = structure(c(3L, 3L, 1L, 1L, 2L, 2L), .Label = c("2", "3", "1"), class = "factor"), year = c(2013L, 2014L, 2013L, 2014L, 2013L, 2014L), mean = c(9.86464372862218, 9.61027271206025, 18.3483708337732, 15.3459903281993, 6.75036415837688, 7.33169996044336), pchange = c(" 68%", " 68%", " 76%", " 76%", " 76%", " 76%")), .Names = c("level", "year", "mean", "pchange"), row.names = c(413L, 414L, 419L, 420L, 425L, 426L), class = "data.frame")
ggplot(dat, aes(x = year, y = mean)) +
geom_line(aes(color = level)) +
geom_dl(aes(label=pchange, color=level), method=list("last.qp"))
Here's some voodoo with invisible unicode chars:
dat$pchange2 <- dat$pchange
dat$pchange2[3:4] <- paste0(dat$pchange[3:4], "\u200B")
ggplot(dat, aes(x = year, y = mean)) +
geom_line(aes(color = level)) +
geom_dl(aes(label=pchange2, color=level), method=list("last.qp"))
If you have multiple lines with the same label, you can add the same char once, twice, etc. The same idea can be used to write a minimal preprocessing function which would be a more or less universal solution.
Here's how a complete solution might look like (by #fishgal64):
require(magrittr)
require(dplyr)
dat <- select(dat, level, pchange) %>%
unique() %>%
mutate(pchange2 = ifelse(duplicated(pchange), paste0(pchange, "\u200B"), pchange)) %>%
merge(dat)

Odd justification of axis text

I am using ggplot2 to plot a pointrange() plot, and both the x and y labels are oddly separate from the end of their respective ticks. This doesn't happen with all of the plots in this particular script, only a few, including this one (which is based on a subset of the available data, but shows the problem):
As you can see, the y axis labels are offset substantially to the left, and the x axis labels are offset substantially below the ticks, to the extent that they are overplotted above the axis label.
The only modification I have made to theme_bw() prior to producing this plot is to set theme_set(theme_bw(base_size = 8)) -- no changes have been deliberately made to the text justification prior to the plot code.
Here is a dput() of the subset plotted below:
TestData:
structure(list(State = c("AL", "AZ", "CA", "CO", "CT", "DC",
"DE", "FL", "GA", "IL"), Year = c(2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L), N = c(22L, 42L, 286L,
99L, 30L, 14L, 20L, 173L, 78L, 29L), Polarization = c(0.352923743188869,
0.505918664112271, 0.445768659699068, 0.555930347461176, -0.0133878043740006,
-0.380342319255035, -0.450998867087007, 0.385507917713463, 0.368070478718073,
0.23368733390603), PolarizationSE = c(0.16021790292877, 0.0650610761652209,
0.0100695976668952, 0.0270310803233059, 0.127526745827604, 0.296328985544823,
0.179653490097689, 0.0180113004747975, 0.0372516250664796, 0.112905812606479
), IDPol = c(0.198743353462518, 0.0416441096132583, 0.0551808637190376,
0.110549247724351, 0.302497569072991, -0.0343523165297017, -0.00367975496702999,
0.0520660142625065, 0.0762126127715774, 0.0936515057040723),
IDPolSE = c(0.102763798140243, 0.0523842634480865, 0.00789292373693809,
0.023425554880421, 0.0918856966184178, 0.184867986813743,
0.122339223641891, 0.0137386656250425, 0.0285951418531372,
0.0896433805255375), Estimate = c(0.00932965761458826, -0.000412018017715892,
0.00315002626133457, 0.00823125148777124, 0.000741919819714724,
-0.0211994171332907, -0.0218353390160545, 0.00290805283382581,
0.00406489584624635, 0.00604261698709428), Std..Error = c(0.00398420082222495,
0.00483343236746232, 0.00186579338568264, 0.0032167092866312,
0.00379995092553099, 0.0128981988697743, 0.0122846784163747,
0.00220581166165486, 0.00335683359383524, 0.00240425995025825
), StateN = structure(c(13L, 25L, 18L, 27L, 4L, 2L, 1L, 15L,
14L, 7L), .Label = c("DE (20)", "DC (14)", "WA (23)", "CT (30)",
"TX (365)", "NY (123)", "IL (29)", "PA (36)", "MI (114)",
"KS (28)", "OK (36)", "NJ (23)", "AL (22)", "GA (78)", "FL (173)",
"SC (69)", "MN (25)", "CA (286)", "OH (85)", "VA (34)", "IN (55)",
"OR (27)", "WI (22)", "NM (64)", "AZ (42)", "TN (77)", "CO (99)",
"LA (83)", "MA (22)", "NC (65)", "MS (63)"), class = "factor")), .Names = c("State",
"Year", "N", "Polarization", "PolarizationSE", "IDPol", "IDPolSE",
"Estimate", "Std..Error", "StateN"), row.names = c(1401L, 1403L,
1404L, 1405L, 1406L, 1407L, 1408L, 1409L, 1410L, 1414L), class = "data.frame")
And here is the code used to produce the plot:
TestData$StateN <- paste(TestData$State, " (", TestData$N, ")", sep = "")
TestData$StateN <- factor(TestData$StateN, levels = TestData$StateN[order(TestData$Polarization)])
ZP17Test <- ggplot(TestData,
aes(x = StateN, y = Polarization,
ymin = Polarization - 1.96 * PolarizationSE, ymax = Polarization + 1.96 * PolarizationSE))
ZP17Test <- ZP17Test + geom_hline(yintercept = 0, colour = I(MyPalette(5)[3]), alpha = I(7/12), size = I(1/3))
ZP17Test <- ZP17Test + geom_pointrange(size = I(1/3))
ZP17Test <- ZP17Test + scale_x_discrete("State (Number of Respondents)")
ZP17Test <- ZP17Test + opts(title = "State Polarization Levels in 2008",
axis.text.x = theme_text(angle=45, hjust=1, size = 7))
print(ZP17Test)
ggsave(plot = ZP17Test, "Analysis/Stack_Overflow.png", h = 4, w = 6)
Thanks in advance for any help you can offer.
Angled axis text labels always messes with my head. What you wanted was this:
axis.text.x = theme_text(hjust = 1,vjust = 1,angle=45, size = 7)
The order you specify them in makes a difference, in my experience. I always have to fiddle with it until I get it just right. Smarter folks can probably remember the system to it.

Resources