Fitted Vs Residuals in monthly time series linear model - r

I am trying to plot the classic "Fitted vs Residual" plot from a time series linear model on the fancy time series in the fpp package:
structure(c(1664.81, 2397.53, 2840.71, 3547.29, 3752.96, 3714.74,
4349.61, 3566.34, 5021.82, 6423.48, 7600.6, 19756.21, 2499.81,
5198.24, 7225.14, 4806.03, 5900.88, 4951.34, 6179.12, 4752.15,
5496.43, 5835.1, 12600.08, 28541.72, 4717.02, 5702.63, 9957.58,
5304.78, 6492.43, 6630.8, 7349.62, 8176.62, 8573.17, 9690.5,
15151.84, 34061.01, 5921.1, 5814.58, 12421.25, 6369.77, 7609.12,
7224.75, 8121.22, 7979.25, 8093.06, 8476.7, 17914.66, 30114.41,
4826.64, 6470.23, 9638.77, 8821.17, 8722.37, 10209.48, 11276.55,
12552.22, 11637.39, 13606.89, 21822.11, 45060.69, 7615.03, 9849.69,
14558.4, 11587.33, 9332.56, 13082.09, 16732.78, 19888.61, 23933.38,
25391.35, 36024.8, 80721.71, 10243.24, 11266.88, 21826.84, 17357.33,
15997.79, 18601.53, 26155.15, 28586.52, 30505.41, 30821.33, 46634.38,
104660.67), .Tsp = c(1987, 1993.91666666667, 12), class = "ts")
library(fpp)
fit = tslm(fancy ~ trend + season)
plot(fitted(fit), residuals(fit), xlab = "Predicted scores", ylab = "Residuals")
The plot is messy because fitted(fit) and residuals(fit) are again monthly time series object and hence the scatterplot does not work.
How can I display the scatterplot as usual in a normal lm?
Thanks for helping.

Thanks everybody,
I found a quick turnaround for this, by transforming ts into vectors before plotting:
fit_vector <- as.vector(fitted(fit))
fit_residuals <- as.vector(residuals(fit))
plot(fit_vector, fit_residuals, xlab = "Predicted scores", ylab = "Residuals")

Related

Plot Lines instead of points in scatterplot

I want to replace the points in my graph with a line like in the first picture, the second picture is what I have.
but its not quite what im looking for, I want a smooth line without the points
I think I have to use predict for the 1/x curve but I am not sure how,
Assuming f(1/x) fits the data well. One can use the lm() function to fix the desired function y= a/x + b and then use the predict() function to estimate the desired points.
If a more complicated nonlinear function is required to fit the data then the nls() maybe required
x<- c(176.01685819061, 21.6704613594849, 19.007554742708, 50.1865574864131, 17.6174002411188, 40.2758022496774, 11.0963214407251, 1249.94375253114, 694.894678288085, 339.786950220117, 42.1452961176151, 220.352895161601, 19.6303352674776, 9.10350287678884, 10.6222946396451, 44.1984352318898, 21.8069112975004, 42.1237630342764, 22.7551891190248, 12.9587850506626, 12.0207189111152, 20.2704921282476, 13.3441156357956, 9.13092569988769, 1781.08346869568, 71.2690023512206, 80.2376892286713, 344.114362037227, 208.830841645638, 91.1778810401913, 2220.0120768657, 41.4820962277111, 16.5730025748281, 32.30173229022, 108.703930214512, 51.6770035143256, 709.071405759588, 87.9618878732223, 10.4198968123037, 34.4951840238729, 57.8603720445067, 72.3289197551429, 30.2366643066749, 23.8696161364716, 270.014690419247, 13.8170113452005, 39.5159584479013, 27.764841260433, 18.0311836472615, 40.5709477295999, 33.1888820958952, 9.03112843931787, 4.63738971549635, 12.7591169313099, 4.7998894219979, 8.93458248803248, 7.33904760386628, 12.0940344070925, 7.17364602165948, 6.514191844409, 9.69911157978057, 6.57874454980745, 7.90556524435596)
y<- c(0.02840637, 0.230728821, 0.2630533, 0.099628272, 0.28381032, 0.12414402, 0.45059978, 0.00400018, 0.00719533500000001, 0.014715103086687, 0.118637201789886, 0.022690875, 0.254707825, 0.54923913, 0.470708088, 0.113126176837872, 0.22928510745, 0.118697847481752, 0.219730100850697, 0.38583864, 0.4159485, 0.24666396693114, 0.374696992776912, 0.547589605297248, 0.00280728, 0.070156727820596, 0.062314855376136, 0.01453005323695, 0.02394282358199, 0.0548378613646, 0.00225224, 0.120533928, 0.301695482, 0.15479046, 0.045996497, 0.096754836, 0.00705147600000001, 0.0568428, 0.47985120103071, 0.14494777, 0.08641493, 0.069128642, 0.165362156, 0.20947132, 0.018517511, 0.36187275779699, 0.126531158458224, 0.180083867690804, 0.277297380904852, 0.1232408972382, 0.15065285976048, 0.55364067, 1.07819275643191, 0.39187665, 1.04169066418176, 0.55962324, 0.68128731, 0.41342697, 0.69699564, 0.76755492, 0.515511133042674, 0.760023430328564, 0.632465844687028)
#data frame for prediction
df <- data.frame(x=sort(x))
# fit model y= a/x + b
model <-lm( y ~ I(1/x))
#summary(model)
#plot model
plot(df$x, predict(model, df), type="l", col="blue")
#optional
points(x, y)
Update - response to comments
x is sorted in the data frame, so that points are plotted in order. If not the line could go from x=1 to x=100, back to x=10 etc. thus making a mess. Try removing the sort and see what happens.
The I(1/x) term is to signal lm to perform the inverse transform on x first and then perform the least squares regression.
The predict() function is on the axis since that is the variable used in the plot function. To change this just assign the output from the predict function to a better variable name and plot that. Or use the "ylab= " option.
For smoothing, you can fit a linear model as foolws:
m <- lm(AM_cost_resorb~I(1/AM_leafP), data=data)
Then extract the predictied values on a new data set that covers the range of the exposure variable.
newx <- seq(min(data$AM_leafP), max(data$AM_leafP), by=0.01)
pr <- predict(m, newdata=data.frame(AM_leafP=newx))
And visualize:
plot(AM_cost_resorb~AM_leafP, data=data, type="p", pch= 15, col="red",ylab="Cost of reabsorbtion (kg C m^-2 yr^-1)", xlab="leaf P before senescence (g P/m2)", ylim=c(0,500), las=1)
lines(newx, y=pr, col="blue", lwd=2)
Data:
data <- structure(list(AM_cost_resorb = c(176.01685819061, 21.6704613594849,
19.007554742708, 50.1865574864131, 17.6174002411188, 40.2758022496774,
11.0963214407251, 1249.94375253114, 694.894678288085, 339.786950220117,
42.1452961176151, 220.352895161601, 19.6303352674776, 9.10350287678884,
10.6222946396451, 44.1984352318898, 21.8069112975004, 42.1237630342764,
22.7551891190248, 12.9587850506626, 12.0207189111152, 20.2704921282476,
13.3441156357956, 9.13092569988769, 1781.08346869568, 71.2690023512206,
80.2376892286713, 344.114362037227, 208.830841645638, 91.1778810401913,
2220.0120768657, 41.4820962277111, 16.5730025748281, 32.30173229022,
108.703930214512, 51.6770035143256, 709.071405759588, 87.9618878732223,
10.4198968123037, 34.4951840238729, 57.8603720445067, 72.3289197551429,
30.2366643066749, 23.8696161364716, 270.014690419247, 13.8170113452005,
39.5159584479013, 27.764841260433, 18.0311836472615, 40.5709477295999,
33.1888820958952, 9.03112843931787, 4.63738971549635, 12.7591169313099,
4.7998894219979, 8.93458248803248, 7.33904760386628, 12.0940344070925,
7.17364602165948, 6.514191844409, 9.69911157978057, 6.57874454980745,
7.90556524435596), AM_leafP = c(0.02840637, 0.230728821, 0.2630533,
0.099628272, 0.28381032, 0.12414402, 0.45059978, 0.00400018,
0.00719533500000001, 0.014715103086687, 0.118637201789886, 0.022690875,
0.254707825, 0.54923913, 0.470708088, 0.113126176837872, 0.22928510745,
0.118697847481752, 0.219730100850697, 0.38583864, 0.4159485,
0.24666396693114, 0.374696992776912, 0.547589605297248, 0.00280728,
0.070156727820596, 0.062314855376136, 0.01453005323695, 0.02394282358199,
0.0548378613646, 0.00225224, 0.120533928, 0.301695482, 0.15479046,
0.045996497, 0.096754836, 0.00705147600000001, 0.0568428, 0.47985120103071,
0.14494777, 0.08641493, 0.069128642, 0.165362156, 0.20947132,
0.018517511, 0.36187275779699, 0.126531158458224, 0.180083867690804,
0.277297380904852, 0.1232408972382, 0.15065285976048, 0.55364067,
1.07819275643191, 0.39187665, 1.04169066418176, 0.55962324, 0.68128731,
0.41342697, 0.69699564, 0.76755492, 0.515511133042674, 0.760023430328564,
0.632465844687028)), class = "data.frame", row.names = c(NA,
-63L))

Logistic Regression in Sigmoid Data R

Overview
Hello I am working on a project with displaying a "best fit line" over raw data. I have very little statistical experience, so I am unsure what methodologies & functions to pursue. I am also unsure what the general output should be.
I am working with sigmoidal data, which can be noisy at times. I was informed that I will end up using logistical regression over linear regression.
Goal
-Plot the approximated logistic regression over the raw data using ggplot.
Sample dput() Data
structure(list(Temperature = c(0.35937, 0.3623, 0.88796, 1.38134,
1.89773, 2.40185, 2.90063, 3.40432, 3.92358, 4.40969, 4.91506,
5.42822, 5.93337, 6.43823, 6.95019, 7.46044, 7.95995, 8.45434,
8.98095, 9.48974, 10.00073, 10.5122, 11.00073, 11.51513, 12.03613,
12.54614, 13.04028, 13.5476, 14.04397, 14.58032, 15.07253, 15.58715,
16.09963, 16.60449, 17.11501, 17.60693, 18.12231, 18.63134, 19.14575,
19.63745, 20.16479, 20.65478, 21.15478, 21.64843, 22.15872, 22.65649,
23.1575, 23.67309, 24.17651, 24.67065, 25.19387, 25.69558, 26.19238,
26.7019, 27.20193, 27.70242, 28.19778, 28.70629, 29.19799, 29.69409,
30.20312, 30.70898, 31.21337, 31.71975, 32.21874, 32.7351, 33.22045,
33.74001, 34.24926, 34.73901, 35.26269, 35.75146, 36.26806, 36.76562,
37.28637, 37.77514, 38.29202, 38.78686, 39.2954, 39.80761, 40.31689,
40.81985, 41.31371, 41.8225, 42.3291, 42.85546, 43.3562, 43.87304,
44.37011, 44.88256, 45.38891, 45.89919, 46.40942, 46.92089, 47.42651,
47.94579, 48.479, 48.96218, 49.47411, 49.9851, 50.49438, 51.02368,
51.52905, 52.04907, 52.55493, 53.05493, 53.57543, 54.07836, 54.59548,
55.12451, 55.6206, 56.12866, 56.64379, 57.14745, 57.65945, 58.17553,
58.68432, 59.18408, 59.70019, 60.22167, 60.71703, 61.24246, 61.77538,
62.26391, 62.77612, 63.29614, 63.77807, 64.30053, 64.81689, 65.33279,
65.85131, 66.35229, 66.86694, 67.3933, 67.91723, 68.41577, 68.9436,
69.44677, 69.95141, 70.46655, 71.01635, 71.49514, 72.00906, 72.51269,
73.03542, 73.5498, 74.07055, 74.5747, 75.1018, 75.63061, 76.15283,
76.67504, 77.17822, 77.68456, 78.19848, 78.69775, 79.2124, 79.70727,
80.22656, 80.76611, 81.26049, 81.78369, 82.29101, 82.81469, 83.33544,
83.87496, 84.32372, 84.85815, 85.45971, 85.89111, 86.3623, 86.93578
), Absorbance = c(1.81071, 1.81388, 1.81683, 1.81888, 1.82262,
1.82458, 1.82688, 1.82958, 1.83234, 1.83512, 1.83743, 1.84024,
1.84237, 1.8451, 1.84772, 1.85036, 1.85254, 1.85495, 1.85805,
1.86069, 1.86304, 1.86508, 1.86808, 1.87077, 1.87352, 1.87564,
1.87863, 1.88164, 1.88402, 1.88598, 1.88886, 1.89159, 1.89392,
1.8968, 1.8995, 1.90179, 1.90508, 1.90725, 1.9098, 1.91265, 1.91516,
1.9173, 1.92062, 1.92298, 1.92563, 1.92855, 1.9307, 1.93383,
1.93642, 1.93903, 1.94168, 1.94381, 1.9462, 1.94994, 1.95289,
1.95581, 1.95902, 1.96158, 1.96398, 1.96661, 1.96978, 1.97321,
1.97583, 1.97916, 1.98271, 1.98456, 1.98892, 1.99297, 1.99605,
1.99921, 2.0035, 2.00686, 2.01138, 2.01495, 2.0189, 2.02396,
2.0282, 2.03317, 2.03781, 2.04254, 2.0479, 2.05363, 2.05974,
2.06564, 2.07107, 2.07914, 2.08561, 2.09258, 2.1002, 2.10902,
2.11876, 2.12582, 2.13495, 2.14506, 2.15465, 2.16517, 2.17522,
2.18627, 2.19739, 2.20907, 2.22094, 2.23388, 2.24563, 2.25891,
2.27144, 2.28452, 2.29779, 2.31205, 2.32543, 2.33695, 2.3501,
2.36332, 2.37649, 2.39207, 2.40574, 2.42009, 2.43282, 2.44392,
2.45723, 2.46878, 2.47973, 2.49073, 2.49976, 2.51041, 2.51965,
2.52679, 2.53644, 2.54241, 2.54962, 2.55618, 2.56106, 2.56637,
2.57346, 2.57632, 2.58174, 2.58477, 2.58925, 2.5937, 2.59516,
2.59829, 2.60149, 2.60401, 2.6065, 2.61033, 2.6111, 2.61375,
2.61648, 2.61617, 2.62002, 2.62089, 2.62385, 2.62798, 2.62696,
2.63116, 2.63123, 2.63459, 2.63557, 2.64139, 2.64367, 2.64472,
2.64471, 2.65139, 2.64948, 2.6567, 2.65765, 2.65911, 2.65614,
2.66194, 2.66976, 2.66926, 2.67418, 2.6769)), class = "data.frame", row.names = c(NA,
-172L))
Sample Data
library(ggplot2)
df = "insert dput() code"
#plot sigmoidal curve
ggplot(df, aes(x = Temperature, y = Absorbance, color = "red")) +
geom_point() +
theme_classic()
If there are any R methods or statistical functions that I can implement, feel free to drop suggestions!
Unfortunately it doesn't look as though a logistic model fits your data very well (a logistic flattens out as x → ± infinity, while your curve looks linear at the extremes). We can do a little better though ...
Fit with self-starting four-parameter logistic (SSfpl(), built-in)
fit <- nls(Absorbance ~ SSfpl(Temperature, left, right, midpt, scale),
data = df)
(at this point I drew the picture as below with just this fit and saw that it was inadequate ...)
Refit with a new model, which is the SSfpl model plus a linear term:
use the previous starting values + (slope = zero)
fit2 <- nls(Absorbance ~ left+(right-left)/(1+exp((midpt-Temperature)/scale))
+ (Temperature-midpt)*slope,
start = c(as.list(coef(fit)), slope = 0),
data = df)
Set up a data frames with predictions:
pred <- data.frame(Temperature = df$Temperature,
Absorbance = predict(fit),
Absorbance2 = predict(fit2))
Draw the picture:
ggplot(df, aes(x = Temperature, y = Absorbance)) +
geom_point(color = "red") +
geom_line(data=pred, lwd = 2) +
geom_line(data=pred, aes(y=Absorbance2), colour = "blue") +
theme_classic()
The extended fit (blue) is very good for temperatures below 30 (linear), then slightly off for the rest of the range (worst near 60).

ARIMA forecasts are way off

I am using ARIMA (auto.arima) to forecast for 52 weeks. The time series model fits the data well (see plot below, red line is the fitted value). The input data has a decreasing trend.
The forecasts (highlighted area) however seems to just taking off after the actual values end.
How can the forecasts be tamed?
dput of the input
> dput(baseTs)
structure(c(5.41951956469523, 5.49312499014084, 5.56299025716832,
5.64442852110163, 5.71385023974044, 5.77578632033402, 5.82985917237953,
5.86346591034374, 5.89626165157029, 5.92013286862512, 5.94200331713403,
5.93996840759539, 5.93917517855891, 5.90355191030718, 5.87180377346416,
5.83190030607801, 5.79624428055153, 5.75377043604686, 5.71445345904649,
5.70025269940165, 5.69789272204017, 5.73728731204876, 5.77015169357394,
5.78936321107329, 5.80113284575595, 5.79449448552444, 5.78193215198878,
5.74003482344406, 5.71694163930612, 5.66689345413153, 5.614357635737,
5.58578389962286, 5.55824727570498, 5.58495146060423, 5.61344117957187,
5.63637441850401, 5.65948408172102, 5.65558124383951, 5.64909390802285,
5.6664546352889, 5.68205689033408, 5.69991437586231, 5.72273650369514,
5.72006065065194, 5.71556512542993, 5.6717608006789, 5.64610326418084,
5.57193975508467, 5.49406607804055, 5.40126523530993, 5.31513540386482,
5.238437956722, 5.15362077920702, 5.11960611878249, 5.08498887979172,
5.08408134201562, 5.07361213981111, 5.04830559379816, 5.01401413448689,
5.0418662607737, 5.06947584464062, 5.08771495309317, 5.10587165060358,
5.1438369937098, 5.1815251206981, 5.2318657906363, 5.29385492077065,
5.29652029253008, 5.29998067741868, 5.28242409629194, 5.2722770646788,
5.24927444462166, 5.22226735874711, 5.16555064465208, 5.10956459841778,
5.09439240612378, 5.07617974794969, 5.04418337811006, 5.0075619037348,
4.99108423417745, 4.9874504485194, 4.99135285004736, 4.99217791657733,
4.94874445528885, 4.90320874819525, 4.84508278068469, 4.79086127023963,
4.75236840849279, 4.71431573721527, 4.71936529020481, 4.72422850167074,
4.72203091743033, 4.71732868614755, 4.71175323610448, 4.70566162766782,
4.71165837247331, 4.71767529028615, 4.75129316683193, 4.7863855803437,
4.85248191548789, 4.91865394024373, 4.9590849617955, 4.99960686851895,
5.02020678181827, 5.04201201976595, 5.02025906892952, 4.99735920720967,
4.92520279823639, 4.84822505567723, 4.81118504683572, 4.77330440072099,
4.72636395544651, 4.6861111959621, 4.64912520396312, 4.61348981514599,
4.58517820348434, 4.56378688913207, 4.549011597464, 4.52900600122321,
4.56028365470815, 4.60248987909752, 4.65628990381626, 4.70496326660038,
4.73779351647955, 4.76616725791407, 4.79569018347378, 4.83185281078024,
4.85177852259102, 4.87488251014986, 4.89468916229158, 4.9077984323135,
4.92375782591088, 4.96363767543938, 5.05416277704822, 5.1426680212522,
5.232495043331, 5.32153608753653, 5.41780853915163, 5.51131526881126,
5.62791210324026), .Tsp = c(2015.05769230769, 2017.73076923077,
52), class = "ts")
The code used
fc <- try(auto.arima(baseTs,ic='aic',approximation = F))
baseFc <- forecast(fc,h = weeks_forecasted)
baseVolume_forecast_new <- baseFc$mean
What could be the reason behind the forecasts exploding?

mgcv: plotting factor 'by' smooths

I want to plot spline effect of a parameter called "NO2" on birthweight, but I want 4 graphs for four quartiles. My current code gives only one graph, could you please help me to figure out the problem? You can see the code at the end, model_1_F1_spline is adjusted for different parameters, but my question is about F1_quartile. When I adjust NO2 by F1_quartile, it includes results for four quartiles, but I don't know how to extract those results and draw 4 graphs.
Here is a reproducible example:
structure(list(coefficients = structure(c(2779.15322482481, 11.6029323631846,
-109.637722127332, -70.5777182211836, -33.2026137282293, 1.34507275289371,
-104.16616170941, -84.3138020433217, 17.079775791272, 49.2699120523702,
65.7993773354024, 73.9523088264003, 62.1308005103464, 11.8305504033343,
17.2509811135892, 34.167485824927, 37.5379409075558, 39.4891005510156,
2.08045456267659, 95.0617726758795, 159.185162814325, 216.767405256274,
30.4053773772453, 67.9509936017346, 75.9715680793893, 76.0634702947319,
197.304475883704, 346.536371507916, 452.520999581153, 582.904282791219,
646.972345369266, -13.117918823958, -21.2577276011179, -36.4775602045112,
-2.53495678184362, 4.25561833400684, -4.24061504987865, 1.22183358211853,
-17.6781972182122, -13.9465039223737, -24.9221422877004, -26.5305128528655,
2.72740931108257, 17.3508955652218, -4.33132009995294, -11.4103790176564,
48.1115836583216, -23.8853869176324, -11.9906695483978, 0.159117077270929,
3.1823388043623, -30.2233558177321, 22.9158634128136, 1.86241593993877,
-7.46279510854093, -17.7265172939209, 15.6908002520418, 10.7367940888643,
11.9368630460758, 48.0464522543244, -10.5383667390476, 8.84142833076189,
38.6344171322845, -4.18823289724547, 20.9039579936433, -27.1572322476693,
-23.3055121479652, -10.125234127069, -2.3505578660444, -5.59801575548779,
21.0487614265911, -0.113655733751338, 1.4592300415459, -0.395003023852113,
-1.33572259818002, -0.195697887437374, -1.22245366980104, 0.161927450428184,
-8.83284987935688, -11.7655241486702, 10.0814083754381, 4.95053998927621,
0.0512729497898481, -2.47612645668306, -0.324705343736638, -2.73702305143146,
0.367899109531455, -17.8006136959884, -20.7138572162521, 1.66439599003613,
0.991339450831016, -0.094477049206764, -0.333359963322134, -0.0535341357101135,
-0.166135609567417, 0.0263694684353763, -0.790300658406237, -7.88088655871398,
2.30124665956728, 0.526763779856579, -0.729268724581621, -1.64502812073609,
0.245438533444878, -1.68875200672467, 0.471404077584143, -12.0519624220913,
-8.61178665100117), .Names = c("(Intercept)", "M_ethni_cat3FB White",
"M_ethni_cat3USB Black", "M_ethni_cat3FB Black", "M_ethni_cat3USB Hispanic",
"M_ethni_cat3FB Hispanic", "M_ethni_cat3USB Asian", "M_ethni_cat3FB Asian",
"M_Age_Cat1", "M_Age_Cat2", "M_Age_Cat3", "M_Age_Cat4", "M_Age_Cat5",
"M_EDU_Cat1", "M_EDU_Cat2", "M_EDU_Cat3", "M_EDU_Cat4", "M_EDU_Cat5",
"MEDICAID1", "prepregBMI_4cat1", "prepregBMI_4cat2", "prepregBMI_4cat3",
"PNC_RECEIVED1", "Parity_Cat1", "Parity_Cat2", "Parity_Cat3",
"gest_clin38", "gest_clin39", "gest_clin40", "gest_clin41", "gest_clin42",
"concept_year2008", "concept_year2009", "concept_year2010", "conc_season_num2",
"conc_season_num3", "conc_season_num4", "s(UHF34).1", "s(UHF34).2",
"s(UHF34).3", "s(UHF34).4", "s(UHF34).5", "s(UHF34).6", "s(UHF34).7",
"s(UHF34).8", "s(UHF34).9", "s(UHF34).10", "s(UHF34).11", "s(UHF34).12",
"s(UHF34).13", "s(UHF34).14", "s(UHF34).15", "s(UHF34).16", "s(UHF34).17",
"s(UHF34).18", "s(UHF34).19", "s(UHF34).20", "s(UHF34).21", "s(UHF34).22",
"s(UHF34).23", "s(UHF34).24", "s(UHF34).25", "s(UHF34).26", "s(UHF34).27",
"s(UHF34).28", "s(UHF34).29", "s(UHF34).30", "s(UHF34).31", "s(UHF34).32",
"s(UHF34).33", "s(UHF34).34", "s(NO2300_mean_total):F1_quartile1.1",
"s(NO2300_mean_total):F1_quartile1.2", "s(NO2300_mean_total):F1_quartile1.3",
"s(NO2300_mean_total):F1_quartile1.4", "s(NO2300_mean_total):F1_quartile1.5",
"s(NO2300_mean_total):F1_quartile1.6", "s(NO2300_mean_total):F1_quartile1.7",
"s(NO2300_mean_total):F1_quartile1.8", "s(NO2300_mean_total):F1_quartile1.9",
"s(NO2300_mean_total):F1_quartile2.1", "s(NO2300_mean_total):F1_quartile2.2",
"s(NO2300_mean_total):F1_quartile2.3", "s(NO2300_mean_total):F1_quartile2.4",
"s(NO2300_mean_total):F1_quartile2.5", "s(NO2300_mean_total):F1_quartile2.6",
"s(NO2300_mean_total):F1_quartile2.7", "s(NO2300_mean_total):F1_quartile2.8",
"s(NO2300_mean_total):F1_quartile2.9", "s(NO2300_mean_total):F1_quartile3.1",
"s(NO2300_mean_total):F1_quartile3.2", "s(NO2300_mean_total):F1_quartile3.3",
"s(NO2300_mean_total):F1_quartile3.4", "s(NO2300_mean_total):F1_quartile3.5",
"s(NO2300_mean_total):F1_quartile3.6", "s(NO2300_mean_total):F1_quartile3.7",
"s(NO2300_mean_total):F1_quartile3.8", "s(NO2300_mean_total):F1_quartile3.9",
"s(NO2300_mean_total):F1_quartile4.1", "s(NO2300_mean_total):F1_quartile4.2",
"s(NO2300_mean_total):F1_quartile4.3", "s(NO2300_mean_total):F1_quartile4.4",
"s(NO2300_mean_total):F1_quartile4.5", "s(NO2300_mean_total):F1_quartile4.6",
"s(NO2300_mean_total):F1_quartile4.7", "s(NO2300_mean_total):F1_quartile4.8",
"s(NO2300_mean_total):F1_quartile4.9"))), .Names = "coefficients")
Here is how I do:
model_1_F1_spline <- gam(BWGT~ s(UHF34,bs="re") + s(NO2300_mean_total, by=F1_quartile)+M_ethni_cat3 + M_Age_Cat + M_EDU_Cat + MEDICAID +
prepregBMI_4cat + PNC_RECEIVED + Parity_Cat + gest_clin + concept_year + conc_season_num, data=births_stressors, method="REML")
png(filename="plot_factor1_spline.png")
plot(model_1_F1_spline, ylab="Change in birth weight (g)", xlab="NO2")
dev.off()
From your provide coefficient vector of your fitted GAM, I could infer that F1_quartile is a factor by variable, with levels 1, 2, 3, 4, so that you have smooth functions s(NO2300_mean_total):F1_quartile1, s(NO2300_mean_total):F1_quartile2, s(NO2300_mean_total):F1_quartile3 and s(NO2300_mean_total):F1_quartile4.
In this situation, calling predict.gam should return you 5 plots, one being a Q-Q plot of your 34-level random intercept s(UHF34, bs = 're'), and 4 plots for the by smooths.
Your question is mainly regarding the by smooths, so consider the following minimal reproducible example.
dat <- data.frame(y = rnorm(40), x = runif(40), f = gl(4, 10))
library(mgcv)
fit <- gam(y ~ f + s(x, k = 5, by = f))
Note that you need to put by as a covariate, too, as factor by smooth is subject to centering constraint (if unclear of this, skip it).
Now if you call plot.gam(fit, page = 1), you will see 4 plots: a smooth s(x) for each level of f.
Note that plot.gam can invisibly return data generating the plots. If you do
oo <- plot.gam(fit, page = 1)
you will see that oo is a list of 4. For each element, say oo[[1]], $x and $fit gives respectively the x-coordinate and y-coordinate of the plot, while se gives standard error. $xlab gives variable name, $ylab gives smooth function name. These data are sufficient for you to reconstruct the plots by plot.gam.

Building and analysing trends in time series

I need advice about building time series. I have a bunch of files with monthly data for sea surface temperature for an number of locations across 408 months. I have aggregated monthly values in a data frame with the following structure
longitude, latitude, SST for month 1, SST for month 2, .... SST for month n
This is just a small piece of the data frame so you can see
dput(sst_subset)
structure(list(lon = c(-19.875, -19.625, -19.375, -19.125), lat = c(30.125,
30.125, 30.125, 30.125), sst = c(293.197412803228, 293.092251515256,
292.999348291526, 293.013219258958), sst.1 = c(292.490350607051,
292.504279178168, 292.502850606771, 292.438922036772), sst.2 = c(291.994832184947,
291.887412832509, 291.832896704695, 291.810638640677), sst.3 = c(292.095993473008,
292.066660140331, 292.091993473098, 292.110326806021), sst.4 = c(293.071606354427,
293.095799902274, 293.106445063326, 293.116122482465), sst.5 = c(294.981993408501,
294.996326741514, 295.004660074661, 295.018993407674), sst.6 = c(295.568703072806,
295.600315975326, 295.597735330222, 295.49418694544), sst.7 = c(296.250961122073,
296.175154672154, 296.079348222683, 296.052251449095)), .Names = c("lon",
"lat", "sst", "sst.1", "sst.2", "sst.3", "sst.4", "sst.5", "sst.6",
"sst.7"), row.names = c(NA, 4L), class = "data.frame")
To build a time series I have extracted a row of the data frame, that corresponds to all the monthly values in a location (defined by longitude and latitude), transposed to a column and created a new data frame
ncolumnes<-ncol(sst_all)
sst_point1<-sst_all[1:3,ncolumnes]
sst1_df <- as.data.frame(t(sst_point1))
dput(sst1_ts)
structure(c(293.197412803228, 292.490350607051, 291.994832184947,
292.095993473008, 293.071606354427, 294.981993408501, 295.568703072806,
296.250961122073, 296.73166003606, 296.385154667461, 294.611660083445,
293.484186990367, 292.372896692626, 291.348207775437, 291.627090257683,
291.957326809441, 292.71063862056, 293.545326773947, 295.897412742879,
296.671928854599, 296.681326703851, 296.483864342674, 294.934660076226,
293.76709020985, 292.45870314232, 291.399993488565, 291.446767681068,
291.918993476964, 292.889025713347, 293.71099343691, 294.01418697852,
296.219025638916, 296.90166003226, 296.119993383065, 294.936326742855,
293.405154734069, 291.834509607885, 291.638564911804, 291.527412840556,
292.055326807251, 292.020961216621, 294.573660084295, 295.850315969738,
295.978380483004, 296.863660033109, 297.228380455065, 296.00866005222,
294.711606317771, 293.067735386772, 291.577136341748, 291.426445100877,
291.602993484028, 292.42096120768, 293.742993436195, 294.709348253305,
295.973219192797, 296.913993365318, 296.213219187433, 294.494326752735,
293.59225150408, 292.492251528667, 291.838207764485, 292.225477341082,
292.385993466526, 294.063864396765, 295.407326732328, 295.98386435385,
297.471928836718, 297.880660010378, 297.070638523107, 294.419993421063,
293.154509578381, 292.307735403759, 291.263441767479, 291.197412847932,
292.566660129155, 293.590316020253, 294.627660083088, 295.085477277156,
296.166122414292, 296.608660038809, 296.143864350273, 294.568660084407,
293.292251510786, 292.269670888481, 291.425350630855, 291.424832197687,
291.351326822986, 292.945799905626, 296.319660045269, 297.158380456629,
297.712251411991, 297.68699334804, 296.391928860858, 294.519660085502,
292.856445068914, 291.953864443927, 291.813922050742, 291.561606388179,
291.680660148958, 293.242574092542, 294.903326743593, 295.748057907507,
297.715799799009, 298.00999334082, 297.161606263009, 295.690326726002,
294.133541814562, 292.727412813734, 292.312493468169, 291.931928960546,
291.646326816392, 291.639670902563, 293.339326778551, 295.357090174311,
297.108703038385, 298.576993328147, 296.577735308317, 295.347660066995,
293.425154733622, 292.446445078078, 291.951027959007, 291.967735411359,
291.957993476093, 292.77838055453, 294.320326756624, 295.738703069007,
296.466122407586, 296.747993369028, 296.3506385392, 294.958326742363,
293.579348278562, 292.182574116234, 291.279279205549, 291.659993482754,
291.872993477993, 292.670316040816, 294.635326749583, 295.305477272238,
296.348057894096, 297.221993358433, 296.08612241608, 294.042993429489,
292.95160635711, 292.009670894293, 291.243207777784, 290.859025758721,
291.319993490353, 292.587412816863, 294.628660083066, 294.788057928965,
296.454832085258, 296.454326708925, 296.265477250781, 295.604326727924,
294.013219236607, 293.043541838926, 292.523922034872, 292.038703151708,
292.477326797818, 294.406122453631, 295.478993397392, 296.886122398199,
297.362251419814, 297.879993343726, 296.978703041291, 295.939326720436,
293.980638592173, 293.048703129133, 291.979993475601, 291.462896712966,
292.266326802534, 293.046445064667, 294.074993428774, 295.435477269333,
296.886122398199, 297.262660024191, 296.517090148383, 295.193326737111,
293.43967086233, 292.486122496546, 292.043564902752, 291.806767673021,
292.480660131077, 293.707735372467, 295.127326738586, 295.877735323964,
296.78192885214, 297.788326679108, 297.02450949188, 295.75766005783,
294.890315991195, 293.371606347722, 292.426422037051, 292.379670886022,
292.746993458457, 293.078057967186, 294.512993418984, 295.54612242815,
296.109348222013, 297.133660027074, 296.816767561039, 295.519326729824,
294.220638586809, 292.947412808816, 291.781422051468, 291.450638648723,
292.118660139168, 293.846122466148, 294.885993410647, 295.964832096211,
297.745154637062, 298.001326674347, 297.287735292448, 295.068993406557,
293.324509574581, 291.593864451974, 291.534821071758, 291.633219289804,
292.017993474752, 292.164187019871, 293.516660107921, 295.506122429044,
296.33321918475, 297.117660027432, 296.34741273282, 294.993660074907,
293.8032192413, 293.077735386549, 292.511779178, 292.344832177124,
292.459326798221, 293.437412797864, 295.860326722202, 296.416444989342,
297.083864329263, 298.678993325867, 297.782251410427, 295.657993393391,
293.652251502739, 293.274186995061, 292.307136325432, 291.922251541408,
291.564993484877, 292.452574110199, 293.996326763866, 294.823219218502,
296.541283696229, 297.421660020637, 296.747735304518, 295.771993390843,
294.041928913384, 293.317090219908, 292.421422037163, 292.680316040593,
292.577660128909, 293.240316028076, 295.254993402399, 296.815477238487,
297.524186900066, 298.126326671553, 297.598380446795, 295.563326728841,
294.207735361291, 293.43805795914, 293.115855519178, 292.753864426046,
292.466993464716, 292.925154744798, 296.035326718291, 296.538380470487,
298.612573972513, 298.241993335634, 297.065154652261, 295.770993390866,
293.72934827521, 292.379670886022, 291.370350632085, 291.601928967922,
292.473326797908, 293.597412794288, 294.678993415274, 296.042896610595,
297.383541741919, 297.729326680427, 296.714186918171, 295.008993407898,
293.465154732728, 292.365154757315, 292.279993468896, 291.722896707154,
292.651993460581, 293.469670861659, 295.145993404835, 296.262896605677,
297.257090131842, 297.550326684428, 297.544832060895, 296.194326714737,
294.499670838637, 293.095799902274, 292.836064885038, 292.445799916802,
292.78566012426, 293.216445060867, 294.3869934218, 295.256767595908,
296.333864346026, 296.692993370257, 296.250315960797, 295.23466006952,
293.713864404588, 292.874187004001, 292.378614156346, 291.931606379908,
292.099326806267, 293.999348269175, 295.055660073521, 296.170638543223,
296.729670788792, 297.024993362837, 296.646444984201, 294.817993412167,
293.368057960704, 292.39579991792, 291.174279207896, 291.343541876924,
291.974660142387, 292.742574103717, 294.785993412882, 296.685477241393,
297.067735297365, 297.318326689613, 297.265154647791, 296.419993376359,
294.439993420616, 293.224509576816, 293.140707735371, 292.928057970539,
293.028326785502, 293.116767643741, 294.067993428931, 295.034832116997,
296.24192886421, 297.204660025487, 297.0212836855, 295.618993394263,
294.195477297049, 293.26644505975, 292.1507077575, 291.842574123834,
292.212326803741, 292.898380551848, 293.698660103853, 294.868057927177,
296.104832093081, 297.440660020212, 296.802574012969, 295.234993402846,
293.692574082483, 292.617090235554, 291.535510726915, 291.344832199475,
292.175660137894, 293.799025693007, 295.795993390307, 296.195799832983,
297.432573998888, 298.643659993323, 297.612251414226, 296.027326718469,
294.692896640769, 293.446122475089, 292.611779175765, 292.494832173771,
293.027326785525, 293.948380528378, 294.144326760558, 295.259670821649,
296.524509503055, 297.014660029734, 296.854832076317, 295.413326732193,
294.306122455866, 292.857735391466, 291.982493475545, 291.549025743299,
292.710993459262, 293.044832161478, 294.210660092408, 296.063864352061,
296.959993364289, 298.161660004097, 297.040315943139, 295.179326737424,
293.474509571228, 292.265799920826, 291.409993488342, 291.042574141715,
291.81732681257, 293.374186992826, 294.908993410133, 296.215799832536,
297.686767541593, 298.667326659461, 297.63999334909, 295.589993394911,
294.077412783559), .Dim = c(408L, 1L), .Dimnames = list(NULL,
"1"), .Tsp = c(1982, 2015.91666666667, 12), class = "ts")
and then decompose in its additive trend, seasonal and random components and remove seasonal component from original data
sst1_dec<-decompose(sst1_ts)
sst1_noseason<-sst1_ts - sst1_dec$seasonal
Now, how do I get a linear regression for this data (sst1_noseason)? I have tried lm() but as there is only single var in the dataframe I think I can't. Should I build a new date column (time) with monthly dates and then run lm (sst ~ time)?
Is there any other R package that deals with time series that can do better? I have looked at ggseas and tidyr, they seem promising but maybe I need to build than date column to run this analysis in any case.
My final objective is to have a single value for the trend in each longitude and latitude point and plot a map to look for the areas with the highest climatic trend for sea surface temperature.
Maybe there is a better procedure and you could point me to another R package running spatio-temporal analysis. Any help would be appreciated.
Thanks in advance for your help
I am not a fan of specialised class in R, since they are usually not as intuitive and require additional vocabulary to deal with. Here's an attempt to convert the time-series you'd made into a data.frame, using zoo package:
library(zoo)
df1 <- data.frame(zoo(sst1_ts), time=as.yearmon(time(sst1_ts)))
df1$jday <- as.Date(df1$time)
(fit1<-lm(X1 ~ jday, df1))
Call:
lm(formula = X1 ~ jday, data = df1)
Coefficients:
(Intercept) jday
2.937e+02 6.025e-05
Plotting is more intuitve with a data.frame as well:
library(ggplot2)
base <- ggplot(df1, aes(jday, X1)) + geom_line() + stat_smooth(method="lm")
p<-base + scale_x_date(date_labels = "%Y")
You can further use an interactive package such as plotly to navigate the plot created with ggplotly.
library(plotly)
ggplotly(p)

Resources