Position dodge with geom_point(), x=continuous, y=factor - r

I have made a function that can plot the loadings from many factor analyses at once, also when their variables do not overlap perfectly (or at all). It works fine, but sometimes factor loadings are identical across analyses which means that the points get plotted on top of each other.
library(pacman)
p_load(devtools, psych, stringr, plotflow)
source_url("https://raw.githubusercontent.com/Deleetdk/psych2/master/psych2.R")
loadings.plot2 = function(fa.objects, fa.names=NA) {
fa.num = length(fa.objects) #number of fas
#check names are correct or set automatically
if (length(fa.names)==1 & is.na(fa.names)) {
fa.names = str_c("fa.", 1:fa.num)
}
if (length(fa.names) != fa.num) {
stop("Names vector does not match the number of factor analyses.")
}
#merge into df
d = data.frame() #to merge into
for (fa.idx in 1:fa.num) { #loop over fa objects
loads = fa.objects[[fa.idx]]$loadings
rnames = rownames(loads)
loads = as.data.frame(as.vector(loads))
rownames(loads) = rnames
colnames(loads) = fa.names[fa.idx]
d = merge.datasets(d, loads, 1)
}
#reshape to long form
d2 = reshape(d,
varying = 1:fa.num,
direction="long",
ids = rownames(d))
d2$time = as.factor(d2$time)
d2$id = as.factor(d2$id)
colnames(d2)[2] = "fa"
print(d2)
#plot
g = ggplot(reorder_by(id, ~ fa, d2), aes(x=fa, y=id, color=time, group=time)) +
geom_point(position=position_dodge()) +
xlab("Loading") + ylab("Indicator") +
scale_color_discrete(name="Analysis",
labels=fa.names)
return(g)
}
#Some example plots
fa1 = fa(iris[-5])
fa2 = fa(iris[-c(1:50),-5])
fa3 = fa(ability)
fa4 = fa(ability[1:50,])
loadings.plot2(list(fa1,fa1,fa2))
Here I've plotted the same object twice just to show the effect. The plot has no red points because the green ones from fa.2 are on top. Instead, I want them to be dodged on the y-axis. However, position="dodge" with various settings does not appear to make a difference.
However, position="jitter" works, but it is random, so sometimes it does not work well as well as makes the plot chaotic to look at.
How do I make the points dodged on the y-axis?

Apparently, you can only dodge sideways, but there is a workaround. The trick is to flip your x and y, do the position_dodge, and then do a coord_flip().
g = ggplot(data = reorder_by(id, ~ fa, d2), aes(x=id, y=fa, color=time, group=time)) +
geom_point(position=position_dodge(width = .5)) +
xlab("Loading") + ylab("Indicator") +
scale_color_discrete(name="Analysis",
labels=fa.names) +
coord_flip()

Possible duplicate
In the linked post, the right answer states that one must use position_jitter() instead of position_dodge(). It has worked for me.

Related

Filter datapoints in a scatterplot using a linear equation

I am writing a function in order to filter out datapoints out of my plot based on a linear equation.
I currently have the following function (with a different function within it):
MD_filter<- function(dataframe, mz_col){
#In-function MD calculation
MZ<- mz_col
MZR<- trunc(mz_col, digits = 0)#Either floor() or trunc() can be used for this part.
MD<- as.numeric(MZ-MZR)
dataframe<- dataframe%>%
dplyr::mutate(MD)%>%
dplyr::select(MD, everything())
#fit data to m/z defect maxima equation
f<- function(x){#This could be problem `1`, maybe resolved by leaving x....
y<-0.00112*x + 0.01953
return(y)}
fit<-as.data.frame(t(apply(dataframe,1,f)))# t() transforms df to matrix...?
filtered<-dataframe[which((dataframe$MD<= fit$MZ)),]
#keep rows in dataframe if MD is less than or equal to fitted value (mz after equation)
#As "fit" calculated the maximum MD value for each MZ value in the MZ column, we subset fit$MZ, as this contains the dataframe MZ values.
#The MD calculated at the very start, needs to be lower than the equivalent MZ value of the fitted dataframe.
filtered<-write.table(filtered,"feature_list_mz_defect_filtered.txt",sep="\t",col.names=NA)
#Now we have pre filter dataframe (dataframe) and post filter df (filtered)
#2 Different plots: (highlight to be removed as well, so we need a 3rd eventually)
MD_plot<- ggplot(dataframe, aes(x= MZ, y = MD)) +
geom_point() +#THE FOLLOWING PART DOES NOT WORK YET
ggtitle(paste("Unfiltered MD data - ", dataframe))
#stat_smooth(method="lm", se=FALSE)-> For linear line through the plot, but may not be necessary to show
return(MD_plot)#While I do get a plot, I have not yet gotten the equation. I could use my earlier approach maybe.
MD_plot_2<- ggplot(filtered, aes(x= MZ, y = MD)) +#Filtered is basically the second dataframe,
#which subsets datapoints with an Y value (which is the MD), below the linear equation MD...
geom_point() +#THE FOLLOWING PART DOES NOT WORK YET
ggtitle(paste("Filtered MD data - ", dataframe))
#stat_smooth(method="lm", se=FALSE) -> For linear line through the plot, but may not be necessary to show
return(MD_plot_2)
}
The function works as follows:
The argument inputs are a dataframe and a specific column inside that dataframe which I call the mz_col.
From this column a second column, the MD column, is generated.
From here on out I want to make two plots:
ggplot 1: A plot with the mz_col (MZ) values on the X axis and the MD values on the Y axis
ggplot 2: EXACTLY the same as ggplot 1, but I want to filter out the datapoints if MD exceeds the linear equation y<-0.00112*x + 0.01953 (as is visible in the code). This linear line is basically my maximum filter limit in the plot, everything above this I want gone.
I've tried many different solutions. I swapped the "x" argument with mz_col among many other solutions such as trying to use plot() instead of ggplot. Currently I'm getting no plot, but I do get this:
Basically my question is: How do I solve my function, so I can get my two plots? The first plot is no real issue, this already works, but the second plot just won't filter out datapoints based on my linear equation.
Thanks in advance! I'm quite new to SO and R, so I apologize if anything is unclear. Please let me know if any clearification is needed and thanks in advance for all the help!
Reproducable sample data:
structure(list(mz = c(446.0394, 346.043, 199.0446, 199.0464, 97.057, 657.0595, 879.0606, 978.0631, 199.0684, 199.0707, 199.0724, 86.0748, 199.0761, 196.0789, 199.0819, 199.0852, 199.0878, 199.089, 908.0935, 147.0958, 199.0999,199.1299, 199.1322, 199.1384, 199.1398, 199.1434, 124.1475, 199.1513, 187.156, 199.1686, 199.1766, 199.1797, 199.1879, 199.1924, 187.1959, 479.1981, 169.1999, 109.2049, 399.2092, 299.2125, 159.2146, 199.2242, 356.2405, 69.2423, 956.4337, 978.5537, 199.5695, 676.5769, 199.5851, 500.6021, 260.6039, 270.6081, 200.6114, 200.6131, 200.6172, 200.6221,
200.6315, 200.6402, 200.6476, 200.766, 200.8591, 200.8732, 200.8768,
200.89, 200.8937, 200.8972, 200.9067, 200.9127, 200.9147, 200.9231,
200.9253, 200.9288, 200.9324, 200.935, 200.9468, 200.9515, 200.9536,
200.9557, 200.9568, 200.9594, 200.9661, 200.968, 200.9729, 200.9745,
200.9819, 200.9837, 200.9858, 200.9937)), row.names = c(NA, -88L), class = c("tbl_df", "tbl", "data.frame"))
I got a bit lost trying to follow your code, but based on your description, does the following work for you?
library(dplyr)
library(ggplot2)
MD_filter <- function(dataframe, mz_col, a = 0.01953, b = 0.00112){
# rename column so that rest of function doesn't depend on inputted column name
dataframe[["MZ"]] <- dataframe[[mz_col]]
# process dataframe
dataframe <- dataframe %>%
select(MZ) %>%
mutate(MD = MZ - trunc(MZ, digits = 0),
MD.limit = a + b*MZ)
p1 <- ggplot(dataframe,
aes(x = MZ, y = MD)) +
geom_point() +
geom_smooth(method = "lm", se = F) +
ggtitle("Unfiltered MD data")
p2 <- p1 %+% filter(dataframe, MD <= MD.limit) +
expand_limits(y = range(dataframe[["MD"]])) + # optional (if you want same
# y-axis range for both plots)
ggtitle("Filtered MD data")
cowplot::plot_grid(p1, p2, nrow = 1)
}
Data & usage
dd <- structure(list(mz = c(
446.0394, 346.043, 199.0446, 199.0464, 97.057, 657.0595, 879.0606,
978.0631, 199.0684, 199.0707, 199.0724, 86.0748, 199.0761, 196.0789,
199.0819, 199.0852, 199.0878, 199.089, 908.0935, 147.0958, 199.0999,
199.1299, 199.1322, 199.1384, 199.1398, 199.1434, 124.1475, 199.1513,
187.156, 199.1686, 199.1766, 199.1797, 199.1879, 199.1924, 187.1959,
479.1981, 169.1999, 109.2049, 399.2092, 299.2125, 159.2146, 199.2242,
356.2405, 69.2423, 956.4337, 978.5537, 199.5695, 676.5769, 199.5851,
500.6021, 260.6039, 270.6081, 200.6114, 200.6131, 200.6172, 200.6221,
200.6315, 200.6402, 200.6476, 200.766, 200.8591, 200.8732, 200.8768,
200.89, 200.8937, 200.8972, 200.9067, 200.9127, 200.9147, 200.9231,
200.9253, 200.9288, 200.9324, 200.935, 200.9468, 200.9515, 200.9536,
200.9557, 200.9568, 200.9594, 200.9661, 200.968, 200.9729, 200.9745,
200.9819, 200.9837, 200.9858, 200.9937)),
row.names = c(NA, -88L),
class = c("tbl_df", "tbl", "data.frame"))
MD_filter(dd, "mz")
# MD_filter(dd, "mz", a = 0.02, b = 0.001) # if you want to change the limit

How to use ggplot with prop.table(table(x)?

First, I have a data with two categorical variables into like this:
nombre <- c("A","B","C","A","D","F","F","H","I","J")
sexo <- c(rep("man",4),rep("woman",6))
edad <- c (25,14,25,76,12,90,65,45,56,43)
pais <- c(rep("spain",3),rep("italy",4),rep("portugal",3))
data <- data.frame(nombre=nombre,sexo=sexo,edad=edad,pais=pais)
If I use:
prop.table(table(data$sexo,data$pais), margin=1)
I can see the relative frequency of the levels, for example for Italy (Man=0.25 Woman=0.5)
but the problem is that when I try to plot the prop.table(table(x)) I get something different
ggplot(as.data.frame(prop.table(table(data),margin=1)), aes(x=pais ,y =Freq, fill=sexo))+geom_bar(stat="identity")
On the Y axis from 0 to 3 and for example in the bar Italy (Woman=2 Man=2.5)
I don't need that (and I don't know what is showing), I want the same with as I had with the table of the prop.table(table(x))
I think the problem is something related with the margin=1
Thanks you!
You need to make the same table
tab = prop.table(table(data$sexo,data$pais), margin=1)
tab = as.data.frame(tab)
Then plot:
ggplot(tab,aes(x=Var2,y=Freq,fill=Var1)) + geom_col()
Or simply:
barplot(prop.table(table(data$sexo,data$pais), margin=1))
You're probably looking for something like position = "dodge"
If I run the following on your data :
P <- prop.table(table(data$sexo,data$pais), margin=1)
ggplot(as.data.frame(P), aes(x = Var2, y = Freq, fill = Var1)) +
geom_bar(stat="identity", position = "dodge")
I output the following graph :

Plotting a datable with multiple columns (all 1:7 rows) via ggplot with a single geom_point() using aesthetics to color them differently

I intend to compare timings between two algorithm-based functions f1,f2 via microbenchmark which work on a rpois simulated dataset with sizes of: [1:7] vector given by 10^seq(1,4,by=0.5) i.e. :
[1] 10.00000 31.62278 100.00000 316.22777 1000.00000 3162.27766 10000.00000
Am working on to plot them as well, with all of the information required from microbenchmark (i.e. min,lq,mean,median,uq and max - yes all of them are required, except for expr and neval). I require this via ggplot on a log-log scale with a single geom_point() and aesthetics with each of the information being of different colours and here is my code for that:
library(ggplot2)
library(microbenchmark)
require(dplyr)
library(data.table)
datasetsizes<-c(10^seq(1,4,by=0.5))
f1_min<-integer(length(datasetsizes))
f1_lq<-integer(length(datasetsizes))
f1_mean<-integer(length(datasetsizes))
f1_median<-integer(length(datasetsizes))
f1_uq<-integer(length(datasetsizes))
f1_max<-integer(length(datasetsizes))
f2_min<-integer(length(datasetsizes))
f2_lq<-integer(length(datasetsizes))
f2_mean<-integer(length(datasetsizes))
f2_median<-integer(length(datasetsizes))
f2_uq<-integer(length(datasetsizes))
f2_max<-integer(length(datasetsizes))
for(loopvar in 1:(length(datasetsizes)))
{
s<-summary(microbenchmark(f1(rpois(datasetsizes[loopvar],10), max.segments=3L),f2(rpois(datasetsizes[loopvar],10), maxSegments=3)))
f1_min[loopvar] <- s$min[1]
f2_min[loopvar] <- s$min[2]
f1_lq[loopvar] <- s$lq[1]
f2_lq[loopvar] <- s$lq[2]
f1_mean[loopvar] <- s$mean[1]
f2_mean[loopvar] <- s$mean[2]
f1_median[loopvar] <- s$median[1]
f2_median[loopvar] <- s$median[2]
f1_uq[loopvar] <- s$uq[1]
f2_uq[loopvar] <- s$uq[2]
f1_max[loopvar] <- s$max[1]
f2_max[loopvar] <- s$max[2]
}
algorithm<-data.table(f1_min ,f2_min,
f1_lq, f2_lq,
f1_mean, f2_mean,
f1_median, f2_median,
f1_uq, f2_uq,
f1_max, cdpa_max, datasetsizes)
ggplot(algorithm, aes(x=algorithm,y=datasetsizes)) + geom_point(aes(color=algorithm)) + labs(x="N", y="Runtime") + scale_x_continuous(trans = 'log10') + scale_y_continuous(trans = 'log10')
I debug my code at each step and uptil the assignment of computed values to a datatable by the name of 'algorithm' it works fine.
Here are the computed runs which are passed as [1:7]vecs into the data table along with datasetsizes (1:7 as well) at the end:
> algorithm
f1_min f2_min f1_lq f2_lq f1_mean f2_mean f1_median f2_median f1_uq f2_uq f1_max f2_max datasetsizes
1: 86.745000 21.863000 105.080000 23.978000 113.645630 24.898840 113.543500 24.683000 120.243000 25.565500 185.477000 39.141000 10.00000
2: 387.879000 52.893000 451.880000 58.359000 495.963480 66.070390 484.672000 62.061000 518.876500 66.116500 734.149000 110.370000 31.62278
3: 1608.287000 341.335000 1845.951500 382.062000 1963.411800 412.584590 1943.802500 412.739500 2065.103500 443.593500 2611.131000 545.853000 100.00000
4: 5.964166 3.014524 6.863869 3.508541 7.502123 3.847917 7.343956 3.851285 7.849432 4.163704 9.890556 5.096024 316.22777
5: 23.128505 29.687534 25.348581 33.654475 26.860166 37.576444 26.455269 37.080149 28.034113 41.343289 35.305429 51.347386 1000.00000
6: 79.785949 301.548202 88.112824 335.135149 94.248141 370.902821 91.577462 373.456685 98.486816 406.472393 135.355570 463.908240 3162.27766
7: 274.367776 2980.122627 311.613125 3437.044111 337.287131 3829.503738 333.544669 3820.517762 354.347487 4205.737045 546.996092 4746.143252 10000.00000
The microbenchmark computed values fine as expected but the ggplot throws up this error:
Don't know how to automatically pick scale for object of type data.table/data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (7): colour, x
Am not being able to resolve this, can anyone let me know what is possibly wrong and correct the plotting procedure for the same?
Also on a sidenote I had to extract all the values (min,lq,mean,median,uq,max) seperately from the computed benchmark seperately since I cant take that as a datatable from the summary itself since it contained expr (expression) and neval columns. I was able to eliminate one of the columns using
algorithm[,!"expr"] or algorithm[,!"neval"]
but I can't eliminate two of them together, i.e.
algorithm[,!"expr",!"neval"] or algorithm[,!("expr","neval")] or algorithm[,!"expr","neval"]
- all possible combinations like that don't work (throws 'invalid argument type' error).
Any possible workaround or solution to this and the plotting (main thing) would be highly appreciated!
Your problem lies mainly with the fact that you're referring to an algorithm column in the ggplot formula that does not exist in your object.
From what you gave, I could do the following :
algorithm$algorithm <- 1:nrow(algorithm)
ggplot(algorithm, aes(x=algorithm,y=datasetsizes)) + geom_point(aes(color=algorithm)) + labs(x="N", y="Runtime") +
scale_x_continuous(trans = 'log10') + scale_y_continuous(trans = 'log10')
and plot this fine :
EDIT : let's clean this up a bit...
As per OP's request, I've cleaned up his code a bit.
There are a lot of things you can work on to improve on your code's readability, but I'm focusing more on the practical aspect here.
Basically, join your variables together in a table if you know they'll end up as such.
There are a bunch of tricks you can use to assign the values to the correct spots, a few of which you'll see in the code below.
library(ggplot2)
library(microbenchmark)
require(dplyr)
library(data.table)
datasetsizes<-c(10^seq(1,4,by=0.5))
l <- length(datasetsizes)
# make a vector with your different conditions
conds <- c('f1', 'f2')
# initalizing a table from the getgo is much cleaner
# than doing everything in separate variables
dat <- data.frame(
datasetsizes = rep(datasetsizes, each = length(conds)), # make replicates for each condition
cond = rep(NA, l*length(conds))
)
dat[, c("min", "lq", "mean", "median", "uq", "max")] <- 0
dat$cond <- factor(dat$cond, levels = conds)
head(dat)
for(i in 1:l){ # for the love of god, don't use something as long as 'loopvar' as an iterative
# I don't have f1 & f2 so I did what I could...
s <- summary(microbenchmark(
"f1" = rpois(datasetsizes[i],10),
"f2" = {length(rpois(datasetsizes[i],10))}))
dat[which(dat$datasetsizes == datasetsizes[i]), # select rows of current ds size
c("cond", "min", "lq", "mean", "median", "uq", "max")] <- s[, !colnames(s)%in%c("neval")]
}
dat <- data.table(dat)
ggplot(dat, aes(x=datasetsizes,y=mean)) +
geom_point(aes(color = cond)) +
geom_line(aes(color = cond)) + # added to see a clear difference btw conds
labs(x="N", y="Runtime") + scale_x_continuous(trans = 'log10') +
scale_y_continuous(trans = 'log10')
This give the following plot.

R - Random number generation in ggplot and Shiny

I am building a plot of Net Present Value (NPV), using FinCal package, and its odds. For the NPV, the cash-flows are simulated using a triangular distribution for sales, normal distribution for costs and so on. So, here is a snippet of what I am doing:
npvCdf <- function(n) {
N <- sort(n)
P <- ecdf(N)
return(P)
}
makePlot <- function(C, m) {
N <- m$NPV / C$MILLION
P <- npvCdf(N)
#
# NPV distribution curve
n <- sort(N)
p <- P(n) * 100
df <- data.frame(npv = n, odds = p)
#
# Points of interest
o <- C$NPV_BREAK_EVEN_WORST_ODDS
q <- round((quantile(n, o)), C$DIGITS)
e <- C$NPV_BREAK_EVEN_VALUE
b <- P(e) * 100 # THIS IS THE ERROR I CANT FIGURE OUT
w <- o * 100
s <- getBreakEven(C, m)
#
# Labels
npvOdds <- paste("Odds of break-even : ", b, "%")
salesThresh <- paste("Sales threshold : ", s)
worstCase <-
paste("Worst case (# 5% odds) : ", q, "million")
#
# Make plot
#
g <- ggplot(df, aes(x = npv, y = odds)) +
geom_line(colour = "blue") +
labs(title = "NPV and Odds") +
labs(x = "NPV (million)") +
labs(y = "Percent (%)") +
geom_vline(xintercept = e,
colour = "red",
linetype = "longdash") +
geom_hline(yintercept = b,
colour = "green",
linetype = "longdash") +
geom_vline(xintercept = q,
colour = "green",
linetype = "dotdash") +
geom_hline(yintercept = w,
colour = "red",
linetype = "dotdash")
The C is a data frame of all the constants that are used for calculations of cash-flows, NPV calculations, etc. For example, C$MILLION=1000000 used to divide NPV for simpler representation. The m is a data-frame of sales, cash-flows and NPV per simulation. The simulations are used for cash-flows (triangular distribution), variable cost (normal distribution) and so on.
And, here is the Shiny code that uses the above snippet.
library(shiny)
source("../npd-c.R")
# Define server logic
shinyServer(function(input, output) {
output$npdPlot <- renderPlot({
C <- data.frame(2017,5000,1000000,3,100,500000,0.0,0.05,0.1,
input$salesRange[1],
input$salesRange[2],
input$salesMode,
input$demDeclMean,
input$demDeclSd,
input$varCostMean,
input$varCostSd,
input$fixedCostRange[1],
input$fixedCostRange[2]
)
names(C) <-
c(
"SEED",
"ITERATIONS",
"MILLION",
"DIGITS",
"PRICE",
"OUTLAY",
"NPV_BREAK_EVEN_VALUE",
"NPV_BREAK_EVEN_WORST_ODDS",
"HURDLE_RATE",
"SALES_TRIANG_MIN",
"SALES_TRIANG_MAX",
"SALES_TRIANG_MODE",
"DEM_DECL_FACTOR_MEAN",
"DEM_DECL_FACTOR_SD",
"VAR_COST_RATE_MEAN",
"VAR_COST_RATE_SD",
"FIX_COST_RATE_MIN",
"FIX_COST_RATE_MAX"
)
n <- npd(C,-1)
g <- makePlot(C,n)
g
})
})
The problem is as follows.
The same code when run in R, I get the plot right in terms of the NPV curve, horizontal and vertical lines. Whereas, when run as a Shiny application, the horizontal and vertical lines are hugely displaced. This is despite, hiving of the NPV and cash-flows code into a separate .R file and setting the same seed value for both the Shiny and non-shiny versions. For example, P(0)=40.07 without Shiny and P(0)=4.7 with Shiny application.
What am I missing?
First of all, let me say this is pretty useful code. It is a nice represenation of a monte-carlo simulation using NPV and I like the plots. It is a post I am pretty sure I will refer back to.
I think I see where the problem is though, it is basically more a matter of mis-interpretation and one small programming error.
The stated problem is that these plots are not showing the same results although they should be. The blue ecdf-NPV curves do look at first glance to be the same:
Shiny version:
Stand alone version:
However if you look carefully, you will see that in fact they are not the same, the expected NPV value (50 percent) in the first case is about 1.5 million, whereas it is only about 0.2 in the second case.
The curves look the same, but they are not. The other point is that there is an error in one of the calculations further confusing things. The "Odds of break-even" are incorrectly calculated and are actually the "Odds of losing money".
The correct calculation should be:
b <- (1-P(e)) * 100
And the correct odds of breaking even in the first case would be around 60%, and in the second case around 95%, which matches up with the expected NPV as well.

ggplot2: add stat_function for particular domain?

I'd like to add a curve to a plot I'm making with ggplot, but I only want the curve to appear for a particular domain.
I've tried various approaches using stat_function:
data <- data.frame(Date = ..., cases = ...)
end_date <- ... ## calculated from a date (e.g., Sys.Date()) minus an offset
start_date <- ... ## end_date - some offset
p1 <- ggplot(data) + aes(x=Date, y=cases) + ... ## data has Date, cases columns
p1 + stat_function(...something..., fun=function(t) ...)
where for something I've tried to put a new, subsetted chunk of data:
data = data[(start_date <= data$Date) & (data$Date <= end_date),] ## no change
and a new aes
aes = aes(xmin = start_date, xmax = end_date)
## error - thinks start_date / end_date don't exist,
## though they are declared earlier
Any suggestions? I've also fiddled around with annotate("path", ...) but nothing concrete there. I feel like this should be something easy, I just don't have my head around the "ggplot way" to make it happen.
It may also be relevant that I'm making these plots in a shiny application, though aside from funny crap w/ data.table, I haven't noticed that affecting anything.
The following seems to work, though it still feels very hacky to me:
data$fit <- ... # evaluate function on Date
relrows <- (start_date <= data$Date) & (data$Date <= end_date)
p1 <- p1 + annotate("line", y=data$fit[relrows], x=data$Date[relrows])
Try adding another label as a new column in your dataframe.
df$newlabel[(start_date <= data$Date) $ (data$Date <= end_date)]<-a
then add groups to your ggplot
p1 <- ggplot(data)
+ aes(x=Date, y=cases, group=newlabel, colour=newlabel)
+ geom_point()
+ stat_smooth(method = "lm", formula = y ~ poly(x,2), size=1)

Resources