glm fitted values mirrored/won't match - r

I've got a strange problem with plotting the fitted values of a glm.
My code is:
Data <- data.frame("Sp" = c(111.4, 185, 231, 272.5, 309, 342, 371, 399,
424, 447, 469, 489, 508, 527, 543, 560, 575, 589, 603, 616, 630, 642, 653,
664, 675, 685, 695, 705, 714, 725, 731, 740), "nrC" = 1:32)
modell <- glm(Sp ~ nrC, data = Data, family = Gamma)
pred <- predict(modell, newdata = data.frame("nrC" = 1:32), type = "response")
plot(Data$nrC, Data$Sp, xlim = c(0, 40), ylim = c(50, 1000))
lines(Data$nrC, pred, col = "blue")
The blue line representing the fitted values seems to be ok, apart from being horizontally mirrored.
I'm relatively new to this, so maybe I'm missing something obvious here, but I can't figure out what's wrong.
Doing the same with the data presented here works perfectly fine.
I'be grateful for any hints!

The gamma distribution isn't quite right for this data set. The data shown in the plot as you have it formulated shows a square root-ish looking function. Try specifying the model like this:
modell <- glm(Sp ~ sqrt(nrC), data = Data, family = gaussian)
pred <- predict(modell, newdata = data.frame("nrC" = 1:32), type = "response")
plot(Data$nrC, Data$Sp, xlim = c(0, 40), ylim = c(50, 1000))
lines(Data$nrC, pred, col = "blue")

Related

Forecasting with ARIMA and dummy variables

I am attempting to include a dummy regressor that notes the beginning of the pandemic and runs a regression with ARIMA errors.
My dataset revolves around breaking & entering's happening in Toronto from 2014 to 2021. The issue is that the trend takes a turn due to covid-19 around 2020.
Auto.arima provides me with a ARIMA(1,0,1) model as it is not taking into account the impact of covid-19 and is performing according to the implied return to the series average.
When trying to include a dummy regressor that notes the beginning of the pandemic and run a regression with ARIMA errors I get the following error:
In ifelse(time(BEDATA_GROUPEDtsssarima) >= yearmonth("2020-03"), :
Incompatible methods ("Ops.ts", ">=.vctrs_vctr") for ">="
Code:
# Create a binary time series that indicates the start of the pandemic
library(fpp3)
library(forecast)
library(zoo)
# Check if timeseries
class(BEDATA_GROUPED)
#Convert timeseries
BEDATA_GROUPEDtsssarima <- ts(BEDATA_GROUPED[,2], frequency = 12, start = c(2014, 1))
class(BEDATA_GROUPEDtsssarima)
#Plot
forecast::autoplot(BEDATA_GROUPEDtsssarima)
# Assume that the pandemic began in March 2020
pandemic_dummy <- ifelse(time(BEDATA_GROUPEDtsssarima) >= yearmonth("2020-03"), 1, 0)
# Use auto.arima() to fit an ARIMA model with the dummy variable as an exogenous variable
beddatamodel <- auto.arima(BEDATA_GROUPEDtsssarima, xreg = pandemic_dummy, ic="aic", trace = TRUE)
# Create a binary time series that indicates the start of the pandemic
# In this example, we will assume that the pandemic began in March 2020
pandemic_dummy <- ifelse(time(BEDATA_GROUPEDtsssarima) >= yearmonth("2020-03"), 1, 0)
# Use auto.arima() to fit an ARIMA model with the dummy variable as an exogenous variable
beddatamodel <- auto.arima(BEDATA_GROUPEDtsssarima, xreg = pandemic_dummy, ic="aic", trace = TRUE)
# Create a binary time series for the forecast period that includes the pandemic dummy variable
forecast_period <- time(BEDATA_GROUPEDtsssarima)["2022/01/01/":"2023/12/31/"]
pandemic_dummy_forecast <- ifelse(forecast_period >= yearmonth("2020-03"), 1, 0)
# Use the forecast()
forecast(pandemic_dummy_forecast)
Dataset:
structure(list(occurrence_yrmn = c("2014-January", "2014-February",
"2014-March", "2014-April", "2014-May", "2014-June", "2014-July",
"2014-August", "2014-September", "2014-October", "2014-November",
"2014-December", "2015-January", "2015-February", "2015-March",
"2015-April", "2015-May", "2015-June", "2015-July", "2015-August",
"2015-September", "2015-October", "2015-November", "2015-December",
"2016-January", "2016-February", "2016-March", "2016-April",
"2016-May", "2016-June", "2016-July", "2016-August", "2016-September",
"2016-October", "2016-November", "2016-December", "2017-January",
"2017-February", "2017-March", "2017-April", "2017-May", "2017-June",
"2017-July", "2017-August", "2017-September", "2017-October",
"2017-November", "2017-December", "2018-January", "2018-February",
"2018-March", "2018-April", "2018-May", "2018-June", "2018-July",
"2018-August", "2018-September", "2018-October", "2018-November",
"2018-December", "2019-January", "2019-February", "2019-March",
"2019-April", "2019-May", "2019-June", "2019-July", "2019-August",
"2019-September", "2019-October", "2019-November", "2019-December",
"2020-January", "2020-February", "2020-March", "2020-April",
"2020-May", "2020-June", "2020-July", "2020-August", "2020-September",
"2020-October", "2020-November", "2020-December", "2021-January",
"2021-February", "2021-March", "2021-April", "2021-May", "2021-June",
"2021-July", "2021-August", "2021-September", "2021-October",
"2021-November", "2021-December"), MCI = c(586, 482, 567, 626,
625, 610, 576, 634, 636, 663, 657, 556, 513, 415, 510, 542, 549,
618, 623, 666, 641, 632, 593, 617, 541, 523, 504, 536, 498, 552,
522, 519, 496, 541, 602, 570, 571, 492, 560, 525, 507, 523, 593,
623, 578, 657, 683, 588, 664, 582, 619, 512, 630, 644, 563, 654,
635, 732, 639, 748, 719, 567, 607, 746, 739, 686, 805, 762, 696,
777, 755, 675, 704, 617, 732, 609, 464, 487, 565, 609, 513, 533,
505, 578, 526, 418, 428, 421, 502, 452, 509, 492, 478, 469, 457,
457)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-96L))
I see you have used the fpp3 library, so I've had a go using the tidyverts tools. I've had a go at three models: a plain ARIMA, a plain regression using the pandemic dummy variable, and a dynamic model using both ARIMA and the dummy variable.
Hope this helps! :-)
library(tsibble)
library(fable)
library(fabletools)
library(feasts)
library(dplyr)
Create a tsibble:
BEDATA_GROUPED <- BEDATA_GROUPED |>
mutate(Month = yearmonth(occurrence_yrmn)) |>
as_tsibble(index = Month)
autoplot(BEDATA_GROUPED)
Assume that the pandemic began in March 2020
and create a dummy variable:
pandemic_start <- yearmonth("2020-03-01")
BEDATA_GROUPED <- BEDATA_GROUPED |>
mutate(pandemic_dummy = ifelse(Month >= pandemic_start, 1, 0))
Work up a plain ARIMA:
BEDATA_GROUPED_arima <- BEDATA_GROUPED |>
model(ARIMA(MCI, stepwise = FALSE))
BEDATA_GROUPED_arima |>
gg_tsresiduals()
BEDATA_GROUPED_arima |>
forecast(h = 5) |>
autoplot()
Work up a plain regression:
BEDATA_GROUPED_TSLM <- BEDATA_GROUPED |>
model(TSLM(MCI ~ pandemic_dummy)) |>
report()
BEDATA_GROUPED_TSLM |>
gg_tsresiduals()
Make a data set to predict on:
new_data <- structure(list(Month = structure(c(18993, 19024, 19052, 19083,
19113), class = c("yearmonth", "vctrs_vctr")), pandemic_dummy = c(1,
1, 1, 1, 1)), class = c("tbl_ts", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), key = structure(list(.rows = structure(list(
1:5), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), index = structure("Month", ordered = TRUE), index2 = "Month", interval = structure(list(
year = 0, quarter = 0, month = 1, week = 0, day = 0, hour = 0,
minute = 0, second = 0, millisecond = 0, microsecond = 0,
nanosecond = 0, unit = 0), .regular = TRUE, class = c("interval",
"vctrs_rcrd", "vctrs_vctr")))
Forecast plain regression:
BEDATA_GROUPED_TSLM |>
forecast(new_data = new_data) |>
autoplot()
Work up a dynamic regression, with ARIMA and the pandemic dummy variablee:
BEDATA_GROUPED_dyn_ARIMA <- BEDATA_GROUPED |>
model(ARIMA(MCI ~ pandemic_dummy)) |>
report()
BEDATA_GROUPED_dyn_ARIMA |>
gg_tsresiduals()
BEDATA_GROUPED_dyn_ARIMA |>
forecast(new_data = new_data) |>
autoplot()

Interpolate with splines without surpassing next value R

I have a dataset of accumulated data. I am trying to interpolate some missing values but at some points I get a superior value. This is an example of my data:
dat <- tibble(day=c(1:30),
value=c(278, 278, 278, NA, NA, 302, 316, NA, 335, 359, NA, NA,
383, 403, 419, 419, 444, NA, NA, 444, 464, 487, 487, 487,
NA, NA, 487, 487, 487, 487))
My dataset is quite long and when I use smooth.spline to interpolate the missing values I get a value greater than the next observation, which is quite aabsurd considering I am dealing with accumulated data. This is the output I get:
value.smspl <- c(278, 278, 278, 287.7574, 295.2348, 302, 316, 326.5689, 335,
359, 364.7916, 377.3012, 383, 403, 419, 419, 444, 439.765, 447.1823,
444, 464, 487, 487, 487, 521.6235, 526.3715, 487, 487, 487, 487)
My question is: can you somehow set boundaries for the interpolation so the result is reliable? If so, how could you do it?
You have monotonic data for interpolation. We can use "hyman" method in spline():
x <- dat$day
yi <- y <- dat$value
naInd <- is.na(y)
yi[naInd] <- spline(x[!naInd], y[!naInd], xout = x[naInd], method = "hyman")$y
plot(x, y, pch = 19) ## non-NA data (black)
points(x[naInd], yi[naInd], pch = 19, col = 2) ## interpolation at NA (red)
Package zoo has a number of functions to fill NA values, one of which is na.spline. So as G. Grothendieck (a wizard for time series) suggests, the following does the same:
library(zoo)
library(dplyr)
dat %>% mutate(value.interp = na.spline(value, method = "hyman"))

interpolation - r approx() returns NA when using xout option

here's my data, representing reflectance observed at each wavelength (wl):
wl <- c(442.7, 492.4, 559.8, 664.6, 704.1, 740.5, 782.8, 832.8, 864.7, 945.1, 1613.7, 2202.4)
pt_1 <- c(0.024, 0.0265, 0.0575, 0.0457, 0.1003, 0.2672, 0.3447, 0.3732, 0.3814, 0.3739, 0.1704, 0.0907)
pt_2 <- c(0.0234, 279, 57, 0.0454, .0963, .2642, 3446, .365, 0.3864, .3919, .1726, 0.0884)
mydata <- data.frame(wl, pt_1, pt_2)
this does what I expect if not completely what I want, returning a df with interpolated values for all columns:
interp <- as.data.frame(apply(mydata, 2, function(x) approx(x, y = NULL, method = "linear", n = 20)$y))
I want to specify the xout values (as per my understanding of ?approx):
interp1 <- as.data.frame(apply(mydata, 2, function(x) approx(x, y = NULL, method = "linear", xout = c(450, 550, 670, 700, 704, 706, 740.5, 750, 950, 1510))$y)) #
but this produces NAs only. Can anyone see my error?
even better, I would like to keep the original wavelengths by defining them in xout:
wl_orig <- as.vector(mydata$wl)
interp1 <- as.data.frame(apply(mydata, 2, function(x) approx(x, y = NULL, method = "linear", xout = c(wl_orig, 450, 550, 670, 700, 704, 706, 740.5, 750, 950, 1510))$y)) #

Fitting a sigmoidal curve to points with ggplot

I have a simple dataframe for the response measurements from a drug treatment at various doses:
drug <- c("drug_1", "drug_1", "drug_1", "drug_1", "drug_1",
"drug_1", "drug_1", "drug_1", "drug_2", "drug_2", "drug_2",
"drug_2", "drug_2", "drug_2", "drug_2", "drug_2")
conc <- c(100.00, 33.33, 11.11, 3.70, 1.23, 0.41, 0.14,
0.05, 100.00, 33.33, 11.11, 3.70, 1.23, 0.41, 0.14, 0.05)
mean_response <- c(1156, 1833, 1744, 1256, 1244, 1088, 678, 489,
2322, 1867, 1333, 944, 567, 356, 200, 177)
std_dev <- c(117, 317, 440, 200, 134, 38, 183, 153, 719,
218, 185, 117, 166, 167, 88, 50)
df <- data.frame(drug, conc, mean_response, std_dev)
I can plot these point using the following code and get the basic foundation of the visualization that I would like:
p <- ggplot(data=df, aes(y=mean_response, x= conc, color = drug)) +
geom_pointrange(aes(ymax = (mean_response + std_dev), ymin = (mean_response - std_dev))) +
scale_x_log10()
p
The next thing I would like to do with these data is add a sigmoidal curve to the plot, that fits the plotted points for each drug. Following that, I would like to calculate the EC50 for this curve.
I realize I may not have the entire range of the sigmoidal curve in my data, but I am hoping to get the best estimate I can with what I have. Also, the final point for drug_1 does not follow the expected trend of a sigmoidal curve, but this is actually not unexpected as the solutions that the drug is in can inhibit responses at high concentrations (each drug is in a different solution). I would like to exclude this point from the data.
I am getting stuck at the step of fitting a sigmoidal curve to my data. I have looked over some other solutions to fitting sigmoidal curves to data but none seem to work.
One post that is very close to my problem is this:
(sigmoid) curve fitting glm in r
Based on it, I tried:
p + geom_smooth(method = "glm", family = binomial, se = FALSE)
This gives the following error, and seems to default to plotting straight lines:
`geom_smooth()` using formula 'y ~ x'
Warning message:
Ignoring unknown parameters: family
I have also tried the solution from this link:
Fitting a sigmoidal curve to this oxy-Hb data
In this case, I get the following error:
Computation failed in `stat_smooth()`:
Convergence failure: singular convergence (7)
and no lines are added to the plot.
I have tried looking up both of these errors but cannot seem to find a reason that makes sense with my data.
Any help would be much appreciated!
As I said in a comment, I would only use geom_smooth() for a very easy problem; as soon as I run into trouble I use nls instead.
My answer is very similar to #Duck's, with the following differences:
I show both unweighted and (inverse-variance) weighted fits.
In order to get the weighted fits to work, I had to use the nls2 package, which provides a slightly more robust algorithm
I use SSlogis() to get automatic (self-starting) initial parameter selection
I do all of the prediction outside of ggplot2, then feed it into geom_line()
p1 <- nls(mean_response~SSlogis(conc,Asym,xmid,scal),data=df,
subset=(drug=="drug_1" & conc<100)
## , weights=1/std_dev^2 ## error in qr.default: NA/NaN/Inf ...
)
library(nls2)
p1B <- nls2(mean_response~SSlogis(conc,Asym,xmid,scal),data=df,
subset=(drug=="drug_1" & conc<100),
weights=1/std_dev^2)
p2 <- update(p1,subset=(drug=="drug_2"))
p2B <- update(p1B,subset=(drug=="drug_2"))
pframe0 <- data.frame(conc=10^seq(log10(min(df$conc)),log10(max(df$conc)), length.out=100))
pp <- rbind(
data.frame(pframe0,mean_response=predict(p1,pframe0),
drug="drug_1",wts=FALSE),
data.frame(pframe0,mean_response=predict(p2,pframe0),
drug="drug_2",wts=FALSE),
data.frame(pframe0,mean_response=predict(p1B,pframe0),
drug="drug_1",wts=TRUE),
data.frame(pframe0,mean_response=predict(p2B,pframe0),
drug="drug_2",wts=TRUE)
)
library(ggplot2); theme_set(theme_bw())
(ggplot(df,aes(conc,mean_response,colour=drug)) +
geom_pointrange(aes(ymin=mean_response-std_dev,
ymax=mean_response+std_dev)) +
scale_x_log10() +
geom_line(data=pp,aes(linetype=wts),size=2)
)
I believe the EC50 is equivalent to the xmid parameter ... note the large differences between weighted and unweighted estimates ...
I would suggest next approach which is close to what you want. I also tried with a setting for your data using binomial family but there are some issues about values between 0 and 1. In that case you would need an additional variable in order to determine the respective proportions. The code in the following lines use a non linear approximation in order to sketch your output.
Initially, the data:
library(ggplot2)
#Data
df <- structure(list(drug = c("drug_1", "drug_1", "drug_1", "drug_1",
"drug_1", "drug_1", "drug_1", "drug_1", "drug_2", "drug_2", "drug_2",
"drug_2", "drug_2", "drug_2", "drug_2", "drug_2"), conc = c(100,
33.33, 11.11, 3.7, 1.23, 0.41, 0.14, 0.05, 100, 33.33, 11.11,
3.7, 1.23, 0.41, 0.14, 0.05), mean_response = c(1156, 1833, 1744,
1256, 1244, 1088, 678, 489, 2322, 1867, 1333, 944, 567, 356,
200, 177), std_dev = c(117, 317, 440, 200, 134, 38, 183, 153,
719, 218, 185, 117, 166, 167, 88, 50)), class = "data.frame", row.names = c(NA,
-16L))
In a non linear least squares, you need to define initial values for the search of ideal parameters. We use next code with base function nls() to obtain those initial values:
#Drug 1
fm1 <- nls(log(mean_response) ~ log(a/(1+exp(-b*(conc-c)))), df[df$drug=='drug_1',], start = c(a = 1, b = 1, c = 1))
#Drug 2
fm2 <- nls(log(mean_response) ~ log(a/(1+exp(-b*(conc-c)))), df[df$drug=='drug_2',], start = c(a = 1, b = 1, c = 1))
With this initial approach of parameters, we sketch the plot using geom_smooth(). We again use nls() to find the right parameters:
#Plot
ggplot(data=df, aes(y=mean_response, x= conc, color = drug)) +
geom_pointrange(aes(ymax = (mean_response + std_dev), ymin = (mean_response - std_dev))) +
geom_smooth(data = df[df$drug=='drug_1',],method = "nls", se = FALSE,
formula = y ~ a/(1+exp(-b*(x-c))),
method.args = list(start = coef(fm1),
algorithm='port'),
color = "tomato")+
geom_smooth(data = df[df$drug=='drug_2',],method = "nls", se = FALSE,
formula = y ~ a/(1+exp(-b*(x-c))),
method.args = list(start = coef(fm0),
algorithm='port'),
color = "cyan3")
The output:

How to smooth data of increasing noise

Chemist here (so not very good with statistical analysis) and novice in R:
I have various sets of data where the yield of a reaction is monitored with time such as:
The data:
df <- structure(list(time = c(15, 30, 45, 60, 75, 90, 105, 120, 135,
150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330,
345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525,
540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720,
735, 750, 765, 780, 795, 810, 825, 840, 855, 870, 885, 900, 915,
930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080,
1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245,
1260, 1275, 1290, 1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410,
1425, 1440, 1455, 1470, 1485, 1500, 1515, 1530, 1545, 1560, 1575,
1590, 1605, 1620, 1635, 1650, 1665, 1680, 1695, 1710, 1725, 1740,
1755, 1770, 1785, 1800, 1815, 1830, 1845, 1860, 1875, 1890, 1905,
1920, 1935, 1950, 1965, 1980, 1995, 2010, 2025, 2040, 2055, 2070,
2085, 2100, 2115, 2130), yield = c(9.3411, 9.32582, 10.5475,
13.5358, 17.3376, 16.7444, 20.7234, 19.8374, 24.327, 27.4162,
27.38, 31.3926, 29.3289, 32.2556, 33.0025, 35.3358, 35.8986,
40.1859, 40.3886, 42.2828, 41.23, 43.8108, 43.9391, 43.9543,
48.0524, 47.8295, 48.674, 48.2456, 50.2641, 50.7147, 49.6828,
52.8877, 51.7906, 57.2553, 53.6175, 57.0186, 57.6598, 56.4049,
57.1446, 58.5464, 60.7213, 61.0584, 57.7481, 59.9151, 64.475,
61.2322, 63.5167, 64.6289, 64.4245, 62.0048, 65.5821, 65.8275,
65.7584, 68.0523, 65.4874, 68.401, 68.1503, 67.8713, 69.5478,
69.9774, 73.4199, 66.7266, 70.4732, 67.5119, 69.6107, 70.4911,
72.7592, 69.3821, 72.049, 70.2548, 71.6336, 70.6215, 70.8611,
72.0337, 72.2842, 76.0792, 75.2526, 72.7016, 73.6547, 75.6202,
76.5013, 74.2459, 76.033, 78.4803, 76.3058, 73.837, 74.795, 76.2126,
75.1816, 75.3594, 79.9158, 77.8157, 77.8152, 75.3712, 78.3249,
79.1198, 77.6184, 78.1244, 78.1741, 77.9305, 79.7576, 78.0261,
79.8136, 75.5314, 80.2177, 79.786, 81.078, 78.4183, 80.8013,
79.3855, 81.5268, 78.416, 78.9021, 79.9394, 80.8221, 81.241,
80.6111, 79.7504, 81.6001, 80.7021, 81.1008, 82.843, 82.2716,
83.024, 81.0381, 80.0248, 85.1418, 83.1229, 83.3334, 83.2149,
84.836, 79.5156, 81.909, 81.1477, 85.1715, 83.7502, 83.8336,
83.7595, 86.0062, 84.9572, 86.6709, 84.4124)), .Names = c("time",
"yield"), row.names = c(NA, -142L), class = "data.frame")
What i want to do to the data:
I need to smooth the data in order to plot the 1st derivative. In the paper the author mentioned that one can fit a high order polynomial and use that to do the processing which i think is wrong since we dont really know the true relationship between time and yield for the data and is definitely not polyonymic. I tried regardless and the plot of the derivative did not make any chemical sense as expected. Next i looked into loess using: loes<-loess(Yield~Time,data=df,span=0.9) which gave a much better fit. However, the best results so far was using :
spl <- smooth.spline(df$Time, y=df$Yield,cv=TRUE)
colnames(predspl)<-c('Time','Yield')
pred.der<-as.data.frame(predict(spl, deriv=1))
colnames(pred.der)<-c('Time', 'Yield')
which gave the best fit especially in the initial data points (by visual inspection).
The problem i have:
The issue however is that the derivative looks really good only up to t=500s and then it starts wiggling more and more towards the end. This shouldnt happen from a chemistry point of view and it is just a result of overfitting towards the end of the data due to the increase of the noise. I know this since for some experiments that i have performed 3 times and averaged the data (so the noise decreased) the wiggling is much smaller in the plot of the derivative.
What i have tried so far:
I tried different values of spar which although it smoothens correctly the later data it causes a poor fit in the initial data (which are the most important). I also tried to reduce the number of knots but i got a similar result with the one from changing the spar value. What i think i need is to have a larger amount of knots in the begining which will smoothly decrease to a small number of knots towards the end to avoid that overfitting.
The question:
Is my reasoning correct here? Does anyone know how can i have the above effect in order to get a smooth derivative without any wiggling? Do i need to try a different fit other than the spline maybe? I have attached a pic in the end where you can see the derivative from the smooth.spline vs time and a black line (drawn by hand) of what it should look like. Thank you for your help in advance.
I think you're on the right track on having more closely spaced knots for the spline at the start of the curve. You can specify knot locations for smooth.spline using all.knots (at least on R >= 3.4.3; I skimmed the release notes for R, but couldn't pinpoint the version where this became available).
Below is an example, and the resulting, smoother fit for the derivative after some manual work of trying out different knot positions:
with(df, {
kn <- c(0, c(50, 100, 200, 350, 500, 1500) / max(time), 1)
s <- smooth.spline(time, yield, cv = T)
s2 <- smooth.spline(time, yield, all.knots = kn)
ds <- predict(s, d = 1)
ds2 <- predict(s2, d = 1)
np <- list(mfrow = c(2, 1), mar = c(4, 4, 1, 2))
withr::with_par(np, {
plot(time, yield)
lines(s)
lines(s2, lty = 2, col = 'red')
plot(ds, type = 'l', ylim = c(0, 0.15))
lines(ds2, lty = 2, col = 'red')
})
})
You can probably fine tune the locations further, but I wouldn't be too concerned about it. The primary fits are already near enough indistinguishable, and I'd say you're asking quite a lot from these data in terms of identifying details about the derivative (this should be evident if you plot(time[-1], diff(yield) / diff(time)) which gives you an impression about the level of information your data carry about the derivative).
Created on 2018-02-15 by the reprex package (v0.2.0).

Resources