Overlaying a histogram with normal distribution - r

I neeed to draw a histogram of data (weights) overlayed with a line of the expected normal distributioh
I am totally new to R and statistics. I know I probably got something fundamentally wrong about frequencies density and dnorm, but I am stuck.
weights <- c(97.6,95,94.3 ,92.3 ,90.7 ,89.4 ,88.2 ,86.9 ,85.8 ,85.5 ,84.4 ,84.1 ,82.5 ,81.4 ,80.8 ,80 ,79.8 ,79.5 ,78.4 ,78.4 ,78.2 ,78.1 ,78 ,77.4 ,76.5 ,75.4 ,74.8 ,74.1 ,73.5 ,73.2 ,73 ,72.3 ,72.3 ,72.2 ,71.8 ,71.7 ,71.6 ,71.6 ,71.5 ,71.3 ,70.7 ,70.6 ,70.5 ,69.2 ,68.6 ,68.3 ,67.5 ,67 ,66.8 ,66.6 ,65.8 ,65.6 ,64.9 ,64.6 ,64.5 ,64.5 ,64.3 ,64.2 ,63.9 ,63.7 ,62.7 ,62.3 ,62.2 ,59.4 ,57.8 ,57.8 ,57.6 ,56.4 ,53.6 ,53.2 )
hist(weights)
m <- mean(weights)
sd <- sd(weights)
x <- seq(min(weights), max(weights), length.out length(weights))
xn <- dnorm(x, mean = m, sd = sd) * length(weights) #what is the correct factor???
lines(x, xn)
I expected the line to follow the histogram approximately, but it is too low in the histogram

what you need is to plot the histogram with the frequency of the examples and then plot the density of the weights, i.e.
weights = c(97.6,95,94.3 ,92.3 ,90.7 ,89.4 ,88.2 ,86.9 ,85.8 ,85.5 ,84.4 ,84.1 ,82.5 ,81.4 ,80.8 ,80 ,79.8 ,79.5 ,78.4 ,78.4 ,78.2 ,78.1 ,78 ,77.4 ,76.5 ,75.4 ,74.8 ,74.1 ,73.5 ,73.2 ,73 ,72.3 ,72.3 ,72.2 ,71.8 ,71.7 ,71.6 ,71.6 ,71.5 ,71.3 ,70.7 ,70.6 ,70.5 ,69.2 ,68.6 ,68.3 ,67.5 ,67 ,66.8 ,66.6 ,65.8 ,65.6 ,64.9 ,64.6 ,64.5 ,64.5 ,64.3 ,64.2 ,63.9 ,63.7 ,62.7 ,62.3 ,62.2 ,59.4 ,57.8 ,57.8 ,57.6 ,56.4 ,53.6 ,53.2 )
hist(weights, prob = T)
lines(density(weights), col = "red")
Hope this helps.

The problem in your code is that hist plots frequencies and dnorm calculates densities.
You can try making a histogram with densities and then you will see the histogram or the line just adding freq=F to the histogram:
hist(weights, freq = F)

You're nearly there, you just have to factor in the histogram bin widths.
weights <- c(97.6, 95, 94.3, 92.3, 90.7, 89.4, 88.2, 86.9, 85.8,
85.5, 84.4, 84.1, 82.5, 81.4, 80.8, 80, 79.8, 79.5, 78.4, 78.4,
78.2, 78.1, 78, 77.4, 76.5, 75.4, 74.8, 74.1, 73.5, 73.2, 73,
72.3, 72.3, 72.2, 71.8, 71.7, 71.6, 71.6, 71.5, 71.3, 70.7,
70.6, 70.5, 69.2, 68.6, 68.3, 67.5, 67, 66.8, 66.6, 65.8, 65.6,
64.9, 64.6, 64.5, 64.5, 64.3, 64.2, 63.9, 63.7, 62.7, 62.3,
62.2, 59.4, 57.8, 57.8, 57.6, 56.4, 53.6, 53.2)
h <- hist(weights, freq=TRUE)
binwi <- diff(h$breaks)[1]
x <- seq(min(weights)-10, max(weights)+10, 0.01)
xn <- dnorm(x, mean=mean(weights), sd=sd(weights)) * length(weights) * binwi
lines(x, xn)

Related

Unexpected result while using lowess to smooth a data.table column in R

I have a data.table test_dt in which I want to smooth the y column using lowess function.
test_dt <- structure(list(x = c(28.75, 30, 31.25, 32.5, 33.75, 35, 36.25,
37.5, 38.75, 40, 41.25, 42.5, 43.75, 45, 46.25, 47.5, 48.75,
50, 52.5, 55, 57.5, 60, 62.5, 63.75, 65, 67.5, 70, 72.5, 75,
77.5, 80, 82.5, 85, 87.5, 90, 92.5, 95, 97.5, 100, 102.5, 103.75,
105, 106.25, 107.5, 108.75, 110, 111.25, 112.5, 113.75, 115,
116.25, 117.5, 118.75, 120, 121.25, 122.5, 125, 130, 135, 140,
145), y = c(116.78, 115.53, 114.28, 113.05, 111.78, 110.53, 109.28,
108.05, 106.78, 105.53, 104.28, 103.025, 101.775, 100.525, 99.28,
98.05, 96.8, 95.525, 93.1, 90.65, 88.225, 85.775, 83.35, 82.15,
80.9, 78.5, 76.075, 73.675, 71.25, 68.85, 66.5, 64.075, 61.725,
59.4, 57.075, 54.725, 52.475, 50.225, 48, 45.75, 44.65, 43.55,
42.475, 41.45, 40.35, 39.275, 38.25, 37.225, 36.175, 35.175,
34.175, 33.225, 32.275, 31.3, 30.35, 29.45, 27.625, 24.175, 21,
18.125, 15.55), z = c(116.778248424972, 115.531456655985, 114.284502467544,
113.034850770519, 111.784500981402, 110.533319511795, 109.284500954429,
108.034850457264, 106.784502297216, 105.531265565238, 104.278221015846,
103.026780249377, 101.775992395759, 100.528761292272, 99.2853168637851,
98.043586202838, 96.8021989104315, 95.5702032427799, 93.1041279347743,
90.6575956222915, 88.2179393348852, 85.783500434839, 83.3503011023971,
82.136280706039, 80.922846825298, 78.4965179152157, 76.0823895453039,
73.6686672097464, 71.264486719796, 68.8702598156142, 66.4865368523571,
64.1182523898466, 61.7552221811808, 59.4004347738795, 57.0823289450761,
54.7908645949795, 52.5071096685879, 50.2308279167219, 47.9940967492558,
45.7658417529877, 44.6514226583931, 43.5622751034012, 42.4876666190815,
41.4173110074806, 40.3555584369672, 39.3004471381618, 38.2552969838653,
37.2202353638959, 36.1963659189447, 35.1889616530209, 34.2004259883859,
33.2295174626826, 32.2669278456991, 31.3171387914754, 30.3742375589802,
29.4555719783757, 27.6243725086786, 23.9784367995753, 27.625,
27.625, 27.625)), row.names = c(NA, -61L), class = c("data.table",
"data.frame"))
As can be seen in the image below, I am getting an unexpected result. The expected result is that the line (z column) in the graph below should closely follow the points (y column).
Here is my code -
library(data.table)
library(ggplot2)
test_dt[, z := lowess(x = x, y = y, f = 0.1)$y]
ggplot(test_dt) + geom_point(aes(x, y)) + geom_line(aes(x, z))
Q1. Can someone suggest why lowess is not smoothing properly?
Q2. Since lowess is not working as expected, is there any other function in R that would be more efficient in smoothing the y column without producing a spike (as lowess did on the boundary points)?
You could use loess instead:
test_dt[, z := predict(loess(y ~ x, data = test_dt))]
ggplot(test_dt) + geom_point(aes(x, y)) + geom_line(aes(x, z))
Note though, that if all you want to do is plot the line, this is exactly the method that geom_smooth uses, so without even creating a z column, you could do:
ggplot(test_dt, aes(x, y)) + geom_point() + geom_smooth()
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2021-11-07 by the reprex package (v2.0.0)
The problem got solved by keeping the number of iterations to zero.lowess acts like loess when iterations are kept at zero.
test_dt[, z := lowess(x = x, y = y, f = 0.1, iter=0)$y]

How to specify data range for nls function in r?

I am sorry for maybe a trivial question, but unfortunately i haven´t found a solution for it. Here is my problem...
I have created a function bm6, with 3 unknown parameters (a, l, p), with which i want to aproximate the measured data, that are found in the dataframe zz. For fitting i have used nls model in r.
nls(zz$tuReMa~bm6(zz$Time, t0=30, tau=10, a, l, p), data=zz, start=list(a=0.01, l=0.01, p=0.1))
The model converges on the whole data range, that means from row 1 till 100, and yield the searched parameters.
Data and fit plot:
https://i.stack.imgur.com/iOn5q.png
Now i want to specify my datarange, so that the nls model will take only the data between row 7 and row 37. How can you do that? I have already tried some things but without a success.
nls(zz$tuReMa[7:37]~bm6(zz$Time[7:37], t0=30, tau=10, a, l, p), data=zz, start=list(a=0.01, l=0.01, p=0.1))
the latter works fine in a lm model
other data argument data=list(zz$Time[7:37],zz$tuReMa[7:37])
with subset argument subset = c(7:37)
One can also create a new dataframe with values from 7:37, and than apply the nls model on the new dataframe, but i hope it goes also without this detour.
Additional data:
bm6 <- function(t, t0=30, tau=10, a, l, p) {
ifelse(t<=(tau+t0), 1+a/(p-l)*(exp(-l*(t-t0))/l*(exp(l*(t-t0))-1)-exp(-p*(t-t0))/p*(exp(p*(t-t0))-1)),
1+a/(p-l)*(exp(-l*(t-t0))/l*(exp(l*tau)-1)-exp(-p*(t-t0))/p*(exp(p*tau)-1)))
}
DATA
structure(list(Time = c(0, 5.01, 10.01, 15.02, 20.02, 25.03,
30.03, 35.04, 40.04, 45.05, 50.05, 55.05, 60.06, 66.07, 71.07,
76.08, 81.08, 86.09, 91.09, 96.1, 101.1, 106.11, 111.11, 116.12,
121.12, 126.13, 131.13, 136.14, 142.14, 147.15, 152.15, 157.16,
162.16, 167.17, 172.17, 177.18, 182.18, 187.19, 192.19, 197.2,
202.2, 207.21, 213.21, 218.22, 223.22, 228.23, 233.23, 238.24,
243.24, 248.25, 253.25, 258.26, 263.26, 268.27, 273.27, 278.28,
284.29, 289.29, 294.29, 299.3, 304.3, 309.31, 314.31, 319.32,
324.32, 329.33, 334.34, 339.34, 344.34, 349.35, 355.36, 360.36,
365.36, 370.37, 375.38, 380.38, 385.39, 390.39, 395.39, 400.4,
405.41, 410.41, 415.41, 420.42, 426.43, 431.43, 436.43, 441.44,
446.45, 451.45, 456.46, 461.46, 466.46, 471.47, 476.48, 481.48,
486.48, 491.49, 497.5, 502.5), tu = c(24.8, 16.4, 24.1, 25.8,
20.2, 21, 18.6, 11.8, 21.1, 66.8, 67.4, 72.5, 73.3, 71.6, 72,
65.5, 67.8, 57.1, 61.5, 58.6, 55.9, 60.2, 54.1, 54.6, 52.7, 54.3,
49.8, 49.4, 54.8, 49, 52.4, 50.8, 45.9, 48.4, 48.1, 48.1, 50.5,
44.2, 42.9, 47.3, 51.7, 46.1, 46.9, 44.6, 46.1, 48, 43.2, 38.5,
49.7, 47, 46.9, 51.8, 45, 46.7, 45.8, 39.8, 43.8, 43.3, 45.5,
45.3, 45.9, 38.9, 44.4, 40.8, 40.5, 39.8, 43, 38, 44.7, 42.1,
43, 39.4, 36.6, 44.9, 42.8, 37.2, 41.7, 41.8, 34.7, 44.4, 43.8,
44.7, 44.6, 46.5, 49.7, 42, 36.3, 43.5, 43.7, 41.7, 39.3, 42.5,
45.4, 37.6, 46, 38.5, 39.6, 37.7, 37.9, 39.9), mu = c(26.64,
27.16, 23.43, 24.35, 24.79, 25.4, 25.27, 23.61, 25.36, 27.47,
30.17, 29.94, 28.06, 32.19, 30.96, 35.87, 32.48, 32.41, 33.09,
35.4, 33.68, 33.5, 32.83, 34.19, 32.25, 34.76, 33.69, 33.03,
35.09, 37.13, 36.64, 33.51, 32.91, 33.56, 34.78, 36.06, 33.74,
32.87, 35.57, 36.17, 35.52, 34.43, 33.85, 33.93, 36.69, 34.77,
34.14, 33.46, 34.14, 34.5, 33.03, 33.69, 33.02, 34.23, 33.22,
35.46, 34.28, 31.87, 32.91, 34.25, 33.75, 33.66, 31.08, 32.72,
36.13, 35.3, 32.37, 31.25, 32.98, 34, 34.3, 33.69, 32.33, 33.01,
36.03, 31.59, 34.09, 30.76, 31.8, 32.93, 35.32, 33.69, 31.58,
33.99, 33.67, 33.89, 32.99, 31.17, 32.08, 33.42, 33.91, 34.36,
31.96, 33.27, 31.9, 33.7, 33.16, 30.01, 32.04, 33.59), tuRE = c(1.15043074884029,
0.76076872100729, 1.11795891318754, 1.19681908548708, 0.937044400265076,
0.974155069582505, 0.862823061630219, 0.547382372432074, 0.978793903247184,
3.0987408880053, 3.12657388999337, 3.36315440689198, 3.40026507620941,
3.32140490390987, 3.33996023856859, 3.03843605036448, 3.14512922465209,
2.64877402253148, 2.85288270377734, 2.71835652750166, 2.59310801855533,
2.79257786613651, 2.50960901259112, 2.53280318091451, 2.44466534128562,
2.51888667992048, 2.31013916500994, 2.29158383035123, 2.54208084824387,
2.27302849569251, 2.43074884029158, 2.35652750165673, 2.12922465208747,
2.24519549370444, 2.2312789927104, 2.2312789927104, 2.34261100066269,
2.05036447978794, 1.99005964214712, 2.19416832339298, 2.39827700463883,
2.13850231941683, 2.17561298873426, 2.06891981444665, 2.13850231941683,
2.22664015904573, 2.00397614314115, 1.78595096090126, 2.30550033134526,
2.18025182239894, 2.17561298873426, 2.40291583830351, 2.08747514910537,
2.1663353214049, 2.1245858184228, 1.84625579854208, 2.03180914512922,
2.00861497680583, 2.11066931742876, 2.1013916500994, 2.12922465208747,
1.80450629555997, 2.0596421471173, 1.89264413518887, 1.87872763419483,
1.84625579854208, 1.9946984758118, 1.76275679257787, 2.07355864811133,
1.95294897282969, 1.9946984758118, 1.82770046388337, 1.69781312127237,
2.08283631544069, 1.98542080848244, 1.72564612326044, 1.93439363817097,
1.93903247183565, 1.60967528164347, 2.0596421471173, 2.03180914512922,
2.07355864811133, 2.06891981444665, 2.15705765407555, 2.30550033134526,
1.94831013916501, 1.68389662027833, 2.01789264413519, 2.02717031146455,
1.93439363817097, 1.82306163021869, 1.9715043074884, 2.10603048376408,
1.74420145791915, 2.13386348575215, 1.78595096090126, 1.83697813121272,
1.74884029158383, 1.75811795891319, 1.85089463220676), tuMA = c(24.8,
20.6, 21.7666666666667, 22.775, 22.2733333333333, 21.8533333333333,
20.8866666666667, 17.5066666666667, 18.0466666666667, 34.1333333333333,
47.3133333333333, 59.1, 67.56, 71.3533333333333, 71.9133333333333,
69.96, 68.9, 64.5866666666667, 62.82, 60.76, 58.6933333333333,
58.7, 57.18, 56.0266666666667, 54.7, 54.3, 52.5066666666667,
51.2733333333333, 52.1533333333333, 51.0866666666667, 51.4, 51.3066666666667,
49.5133333333333, 48.7866666666667, 48.3866666666667, 48.0466666666667,
48.7933333333333, 47.46, 45.8066666666667, 45.9866666666667,
47.6866666666667, 47.28, 47.4333333333333, 46.64, 46.2333333333333,
46.54, 45.4933333333333, 43.0733333333333, 44.9466666666667,
45.58, 46.12, 48.3666666666667, 47.7733333333333, 47.3133333333333,
46.7533333333333, 44.2733333333333, 43.6, 43.2933333333333, 43.8333333333333,
44.3866666666667, 45.1733333333333, 43.22, 43.4266666666667,
42.36, 41.5066666666667, 40.74, 41.4466666666667, 40.2133333333333,
41.64, 41.94, 42.4333333333333, 41.5133333333333, 39.9, 41.1466666666667,
41.68, 40.3, 40.8066666666667, 41.1933333333333, 38.8666666666667,
40.4533333333333, 41.7333333333333, 42.8733333333333, 43.78,
45.1333333333333, 46.7666666666667, 45.48, 42.4133333333333,
42.3066666666667, 42.34, 41.8933333333333, 41.18, 41.7133333333333,
42.8, 41.16, 42.7266666666667, 41.5066666666667, 40.7066666666667,
39.4666666666667, 38.8066666666667, 38.7933333333333), tuReMa = c(1.15043074884029,
0.955599734923791, 1.00971946101171, 1.05649436713055, 1.03322288491275,
1.0137397835211, 0.968897724762536, 0.812105146896399, 0.837154848685664,
1.58338855754363, 2.19478683454827, 2.74155069582505, 3.13399602385686,
3.309962447537, 3.3359399160592, 3.24532803180915, 3.19615639496355,
2.99606803622708, 2.91411530815109, 2.81855533465871, 2.72268610558869,
2.72299536116634, 2.65248508946322, 2.59898387453059, 2.53744201457919,
2.51888667992048, 2.43569692953391, 2.37848464766954, 2.41930638391871,
2.36982549149547, 2.3843605036448, 2.38003092555776, 2.2968411751712,
2.26313231720786, 2.24457698254915, 2.22880494808924, 2.26344157278551,
2.20159045725646, 2.12489507400044, 2.13324497459686, 2.2121051468964,
2.19324055666004, 2.20035343494588, 2.1635520212061, 2.14468743096974,
2.15891318754142, 2.11036006185112, 1.99810028716589, 2.08500110448421,
2.1143803843605, 2.13943008614977, 2.24364921581621, 2.21612546940579,
2.19478683454827, 2.16880936602607, 2.05376629114204, 2.02253147779987,
2.00830572122819, 2.03335542301745, 2.05902363596201, 2.09551579412414,
2.00490390987409, 2.01449083278109, 1.96500994035785, 1.92542522641926,
1.88986083499006, 1.92264192622046, 1.86542964435609, 1.93161033797217,
1.9455268389662, 1.96841175171195, 1.92573448199691, 1.85089463220676,
1.90872542522642, 1.93346587143804, 1.86944996686547, 1.89295339076651,
1.91089021426994, 1.80296001767175, 1.87656284515131, 1.9359399160592,
1.98882261983654, 2.03088137839629, 2.09366026065827, 2.16942787718136,
2.10974155069583, 1.96748398497901, 1.96253589573669, 1.96408217362492,
1.94336204992269, 1.91027170311465, 1.93501214932626, 1.98542080848244,
1.90934393638171, 1.98201899712834, 1.92542522641926, 1.88831455710183,
1.83079301965982, 1.80017671747294, 1.79955820631765)), row.names = c(NA,
-100L), class = "data.frame")
I will be really thankful for a solution.
nls itself has a subset argument, e.g. using the built-in CO2 data.frame this uses only the first 10 rows:
nls(uptake ~ a + b * conc, CO2, start = list(a = 0, b = 1), subset = 1:10)
ADDED
Regarding the change in question to fully present it, the problems are
zz should not be part of the formula
better starting values are needed
c(7:37) is the same as 7:37. The c is superfluous.
Remove zz and use the result of the full optimization to start the subset problem:
fm0 <- nls(tuReMa ~ bm6(Time, t0=30, tau=10, a, l, p), data=zz,
start=list(a=0.01, l=0.01, p=0.1));
fm <- nls(tuReMa~bm6(Time, t0=30, tau=10, a, l, p), data=zz,
start=coef(fm0), subset = 7:37)
fm
giving:
Nonlinear regression model
model: tuReMa ~ bm6(Time, t0 = 30, tau = 10, a, l, p)
data: zz
a l p
0.014206 0.007979 0.049172
residual sum-of-squares: 1.678
Number of iterations to convergence: 23
Achieved convergence tolerance: 9.615e-06

add_trace: control the linetype without warning

I am writing a function which returns a plotly object. I managed to control the colors already. However I have trouble controlling the linetype. Currently I use something like:
plot_ly(colors=c(rep(c("#CD0C18","#1660A7"),each=3),'#9467bd'),linetypes = c(rep(c("dot","dash","solid"),2),"dot")) %>%
add_trace(data=long_data,x=~month,y=~temperature,color=~measure,linetype=~measure,type="scatter",mode="lines",line=list(width=4)) %>%
layout(title = "Average High and Low Temperatures in New York",
xaxis = list(title = "Months", categoryorder="array", categoryarray=month),
yaxis = list (title = "Temperature (degrees F)"))
which returns me a warning:
Warning message:
plotly.js only supports 6 different linetypes
The warning makes sense, since measure has seven levels. However I would like to control the linetype without getting a warning every time I have more than 6 traces to plot - is there a way?
My sample data:
month <- c('January', 'February', 'March', 'April', 'May', 'June', 'July',
'August', 'September', 'October', 'November', 'December')
high_2000 <- c(32.5, 37.6, 49.9, 53.0, 69.1, 75.4, 76.5, 76.6, 70.7, 60.6, 45.1, 29.3)
low_2000 <- c(13.8, 22.3, 32.5, 37.2, 49.9, 56.1, 57.7, 58.3, 51.2, 42.8, 31.6, 15.9)
mid_2000 <-apply(rbind(high_2000,low_2000),2,mean)
high_2007 <- c(36.5, 26.6, 43.6, 52.3, 71.5, 81.4, 80.5, 82.2, 76.0, 67.3, 46.1, 35.0)
low_2007 <- c(23.6, 14.0, 27.0, 36.8, 47.6, 57.7, 58.9, 61.2, 53.3, 48.5, 31.0, 23.6)
high_2014 <- c(28.8, 28.5, 37.0, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9)
low_2014 <- c(12.7, 14.3, 18.6, 35.5, 49.9, 58.0, 60.0, 58.6, 51.7, 45.2, 32.2, 29.1)
data <- data.frame(month, high_2000, low_2000,mid_2000, high_2007, low_2007, high_2014, low_2014)
long_data<-tidyr::gather(data,measure,temperature,-month)
As can be seen here, the warning arises in
validLinetypes <- as.character(Schema$traces$scatter$attributes$line$dash$values)
if (length(pal) > length(validLinetypes)) {
warning("plotly.js only supports 6 different linetypes", call. = FALSE)
}
So, if you want to disable this warning alone, there are only two things you can do: override the whole function or manually extend Schema$traces$scatter$attributes$line$dash$values. The latter is somewhat less intrusive and can be done with
tmp <- plotly:::Schema
tmp$traces$scatter$attributes$line$dash$values <- c(tmp$traces$scatter$attributes$line$dash$values, rep(NA, 100))
assignInNamespace("Schema", tmp, ns = "plotly")
Here we add NA 100 times so that up to 106 line types now wouldn't provoke a warning. The last line overrides the Schema variable with tmp in the plotly package environment.
The vector Schema$traces$scatter$attributes$line$dash$values only gets used (through validLinetypes) here four times, and looking at those it seems like this cheating doesn't have any likely side effects.

Opacity by numeric vector

I have a very stylized line chart and I would like to plot all my lines in with one add_trace-command, in the hope that this makes my code neater.
I have two issues:
I want a lower opacity for the high-lines and full opacity for the low lines. If I try it seems to assign the opacity randomly.
I want the upper lines to be solid and the lower ones to be dashed. The opposite happens.
Concerning these issues I have two questions:
Can this 'strange' (not logical) behaviour of plotly be fixed? Maybe by using some layout option?
Does this 'strange' behaviour occur because I am trailing of the path?
The plotly examples usually tell you to write a new add_trace function for every line or other object you had. I am trying to implement all my lines with one add_trace-function.
In my real data, I have more than ten lines to draw and it would really help if I could draw some of the lines together. Here are some sample graph and data:
I tried with this code:
month <- c('January', 'February', 'March', 'April', 'May', 'June', 'July',
'August', 'September', 'October', 'November', 'December')
high_2000 <- c(32.5, 37.6, 49.9, 53.0, 69.1, 75.4, 76.5, 76.6, 70.7, 60.6, 45.1, 29.3)
low_2000 <- c(13.8, 22.3, 32.5, 37.2, 49.9, 56.1, 57.7, 58.3, 51.2, 42.8, 31.6, 15.9)
high_2007 <- c(36.5, 26.6, 43.6, 52.3, 71.5, 81.4, 80.5, 82.2, 76.0, 67.3, 46.1, 35.0)
low_2007 <- c(23.6, 14.0, 27.0, 36.8, 47.6, 57.7, 58.9, 61.2, 53.3, 48.5, 31.0, 23.6)
high_2014 <- c(28.8, 28.5, 37.0, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9)
low_2014 <- c(12.7, 14.3, 18.6, 35.5, 49.9, 58.0, 60.0, 58.6, 51.7, 45.2, 32.2, 29.1)
data <- data.frame(month, high_2000, low_2000, high_2007, low_2007, high_2014, low_2014)
library(plotly)
df<-tidyr::gather(data,key,values,-month)
plot_ly(data=df,x=~month,y=~values,split=~key,type="scatter",
mode="lines",opacity=ifelse(grepl('high',df$key),0.5,1),line=list(color='#1f77b4'),
linetype=ifelse(grepl('2000',df$key),'solid','dashed')) %>%
layout(xaxis=list(categoryarray = month, categoryorder = "array"))
Does this work for you?
df$key <- as.factor(df$key)
df$key <- factor(df$key , levels = c("high_2014","high_2007", "high_2000", "low_2014","low_2007", "low_2000"))
df$high_low <- substr(df$key, 1, 2)
df$high_low <- factor(df$high_low , levels = c("hi","lo"))
df <- df %>% arrange(high_low)
plot_ly(data=df,x=~month,y=~values,split=~key,type="scatter", mode="lines", opacity=ifelse(grepl('high',df$key),0.5,1), line=list(color='#1f77b4'),
linetype= ~ high_low, linetypes = c('solid', 'dashed')) %>%
layout(xaxis=list(categoryarray = month, categoryorder = "array"))

R: How to or should I drop an insignificant orthogonal polynomial basis in a linear model?

I have soil moisture data with x-, y- and z-coordinates like this:
gue <- structure(list(x = c(311939.1507, 311935.4607, 311924.7316, 311959.553,
311973.5368, 311953.3743, 311957.9409, 311948.3151, 311946.7169,
311997.0803, 312017.5236, 312006.0245, 312001.5179, 311992.7044,
311977.3076, 311960.4159, 311970.6047, 311957.2564, 311866.4246,
311870.8714, 311861.4461, 311928.7096, 311929.6291, 311929.4233,
311891.2915, 311890.3429, 311900.8905, 311864.4995, 311870.8143,
311866.9257, 312002.571, 312017.816, 312004.5024, 311947.1186,
311943.0152, 311952.2695, 311920.6095, 311929.8371, 311918.6095,
312011.9019, 311999.5755, 312011.1461, 311913.7251, 311925.3459,
311944.4701, 311910.2079, 311908.7618, 311896.0776, 311864.4814,
311856.9027, 311857.5747, 311967.3779, 311962.2024, 311956.8318,
311977.5254, 311971.1776, 311982.537, 311993.4709, 312004.6407,
312015.6118, 311990.8601, 311994.686, 311988.3037, 311990.518,
311986.3918, 311998.8876, 311923.9157, 311903.4563, 311915.714,
311856.9087, 311858.9812, 311874.5867, 311963.9099, 311938.4542,
311945.9505, 311804.3039, 311797.2571, 311791.6967, 311921.3965,
311928.9353, 311920.0597, 311833.5109, 311829.8683, 311847.6261,
311889.1243, 311902.4909, 311901.245, 311981.1118, 312005.7098,
311976.5858, 311819.8901, 311816.4143, 311819.4172, 311870.418,
311873.2656, 311888.3401, 311910.8377, 311897.6697, 311902.4571,
311846.8196, 311833.6235, 311846.2942, 311931.3916, 311930.1891,
311947.659, 311792.2642, 311793.2539, 311794.1931, 311795.1288,
311796.0806, 311797.0142, 311797.95, 311798.8822, 311799.8229,
311800.7774, 311801.7094, 311802.6395, 311803.583, 311804.5185,
311805.4558, 311806.391, 311807.3346, 311808.2757, 311809.2187,
311810.1549, 311811.1014, 311812.0366, 311812.9667, 311813.9107,
311814.8373, 311815.7777, 311816.7365, 311817.6522, 311818.6091,
311819.5335, 311820.4961, 311821.4337, 311822.3855, 311823.3195,
311824.2713, 311825.214, 311826.1705, 311827.1188, 311828.0501,
311828.9893, 311829.9324, 311830.8706, 311831.8181, 311832.7667,
311833.705, 311834.6546, 311835.609, 311836.5527, 311837.5157,
311838.4495, 311839.3926, 311840.3423, 311841.2799, 311842.2288,
311843.1691, 311844.118, 311845.0746, 311846.019, 311846.9709,
311847.9201, 311848.859, 311849.8105, 311850.7503, 311851.6889,
311852.6355, 311853.6045, 311854.5296, 311855.4717, 311856.4171,
311857.3759, 311858.3151, 311859.2604, 311860.2178, 311861.1636,
311862.1071, 311863.0347, 311863.9857, 311864.9316, 311865.8722,
311866.8158, 311867.7702, 311868.7155, 311869.649, 311870.6018,
311871.5449, 311872.4871, 311873.4352, 311874.385, 311875.3042,
311876.2617, 311877.2068, 311878.1429, 311879.0956, 311880.0401,
311880.9822, 311881.929, 311882.8651, 311883.8017, 311884.7429,
311885.6949, 311886.6349, 311887.7207, 311888.6653, 311889.6041,
311890.5358, 311891.4838, 311892.4292, 311893.3736, 311894.326,
311895.2703, 311896.2182, 311897.1635, 311898.1032, 311899.0496,
311899.9967, 311900.9456, 311901.8889, 311902.8162, 311903.7566,
311904.6996, 311905.6627, 311906.5899, 311907.5448, 311908.4856,
311909.4399, 311910.3649, 311911.3188, 311912.2629, 311913.2022,
311914.1527, 311915.1025, 311916.0425, 311916.985, 311917.9254,
311918.8661, 311919.8174, 311920.7668, 311921.7026, 311922.6517,
311923.5949, 311924.5252, 311925.4599, 311926.422, 311927.3646,
311928.3, 311929.2432, 311930.1796, 311931.1358, 311932.0726,
311933.0069, 311933.9585, 311934.845, 311935.7788, 311936.7193,
311937.6441, 311938.572, 311939.5094, 311940.4666, 311941.4067,
311942.3489, 311943.2712, 311944.2195, 311945.1536, 311946.0927,
311947.0413, 311947.9761, 311948.9082, 311949.8557, 311950.8201,
311951.7616, 311952.7148, 311953.7894, 311954.7289, 311955.6646,
311956.6081, 311957.5588, 311958.4896, 311959.4297, 311960.3761,
311961.3191, 311962.2653, 311963.195, 311964.1501, 311965.0856,
311966.0254, 311966.9739, 311967.9305, 311968.8592, 311971.7861,
311970.758, 311969.8205), y = c(5846548.408, 5846546.489, 5846538.014,
5846525.283, 5846510.302, 5846503.516, 5846529.769, 5846523.06,
5846522.742, 5846512.263, 5846525.347, 5846522.042, 5846537.487,
5846545.587, 5846532.112, 5846425.917, 5846406.543, 5846434.03,
5846500.989, 5846498.286, 5846487.134, 5846488.045, 5846483.29,
5846468.713, 5846534.269, 5846533.527, 5846504.056, 5846453.395,
5846438.43, 5846442.608, 5846406.8, 5846434.58, 5846419.229,
5846441.045, 5846436.903, 5846447.917, 5846460.757, 5846457.428,
5846451.067, 5846445.596, 5846474.031, 5846457.239, 5846532.694,
5846553.938, 5846565.323, 5846446.926, 5846432.549, 5846467.236,
5846473.963, 5846464.78, 5846498.142, 5846458.168, 5846474.018,
5846489.801, 5846559.513, 5846589.975, 5846555.723, 5846553.847,
5846560.066, 5846560.792, 5846455.642, 5846546.374, 5846465.999,
5846432.091, 5846422.061, 5846442.871, 5846485.956, 5846472.811,
5846506.756, 5846416.327, 5846419.623, 5846413.124, 5846587.334,
5846600.116, 5846589.515, 5846463.69, 5846456.712, 5846459.683,
5846600.118, 5846574.99, 5846597.804, 5846419.496, 5846437.615,
5846436.902, 5846567.872, 5846572.857, 5846556.904, 5846388.146,
5846393.088, 5846390.13, 5846481.09, 5846496.127, 5846493.586,
5846545.396, 5846532.126, 5846538.334, 5846388.343, 5846416.117,
5846392.223, 5846513.526, 5846486.644, 5846512.917, 5846395.509,
5846386.421, 5846383.873, 5846459.062, 5846459.36, 5846459.682,
5846460.026, 5846460.377, 5846460.703, 5846461.047, 5846461.378,
5846461.73, 5846462.071, 5846462.418, 5846462.765, 5846463.115,
5846463.466, 5846463.815, 5846464.128, 5846464.505, 5846464.843,
5846465.189, 5846465.52, 5846465.869, 5846466.217, 5846466.557,
5846466.893, 5846467.237, 5846467.586, 5846467.903, 5846468.274,
5846468.601, 5846468.943, 5846469.258, 5846469.592, 5846469.909,
5846470.247, 5846470.565, 5846470.891, 5846471.24, 5846471.536,
5846471.885, 5846472.224, 5846472.553, 5846472.884, 5846473.225,
5846473.532, 5846473.89, 5846474.179, 5846474.502, 5846474.827,
5846475.146, 5846475.448, 5846475.768, 5846476.102, 5846476.428,
5846476.746, 5846477.069, 5846477.37, 5846477.685, 5846478.009,
5846478.335, 5846478.656, 5846478.958, 5846479.299, 5846479.608,
5846479.926, 5846480.267, 5846480.603, 5846480.908, 5846481.246,
5846481.56, 5846481.877, 5846482.19, 5846482.503, 5846482.825,
5846483.144, 5846483.468, 5846483.811, 5846484.13, 5846484.458,
5846484.8, 5846485.125, 5846485.456, 5846485.778, 5846486.112,
5846486.421, 5846486.75, 5846487.08, 5846487.401, 5846487.744,
5846488.067, 5846488.39, 5846488.728, 5846489.067, 5846489.383,
5846489.716, 5846490.054, 5846490.38, 5846490.719, 5846491.044,
5846491.357, 5846491.694, 5846492.005, 5846492.402, 5846492.726,
5846493.045, 5846493.389, 5846493.708, 5846494.049, 5846494.363,
5846494.686, 5846494.982, 5846495.3, 5846495.64, 5846495.957,
5846496.263, 5846496.584, 5846496.911, 5846497.241, 5846497.591,
5846497.914, 5846498.226, 5846498.553, 5846498.893, 5846499.221,
5846499.538, 5846499.869, 5846500.19, 5846500.508, 5846500.82,
5846501.151, 5846501.492, 5846501.827, 5846502.147, 5846502.471,
5846502.803, 5846503.129, 5846503.46, 5846503.783, 5846504.11,
5846504.448, 5846504.76, 5846505.118, 5846505.445, 5846505.79,
5846506.106, 5846506.465, 5846506.795, 5846507.118, 5846507.448,
5846507.758, 5846508.081, 5846508.396, 5846508.645, 5846508.99,
5846509.34, 5846509.685, 5846510.031, 5846510.363, 5846510.693,
5846511.031, 5846511.362, 5846511.694, 5846512.024, 5846512.354,
5846512.701, 5846513.034, 5846513.353, 5846513.683, 5846513.998,
5846514.32, 5846514.636, 5846514.956, 5846515.326, 5846515.65,
5846515.968, 5846516.301, 5846516.634, 5846516.971, 5846517.318,
5846517.64, 5846517.952, 5846518.308, 5846518.626, 5846518.937,
5846519.27, 5846519.597, 5846519.921, 5846520.245, 5846520.581,
5846521.498, 5846521.209, 5846520.893), z = c(26.485, 26.411,
26.339, 27.248, 27.208, 26.799, 27.199, 27.023, 26.973, 26.908,
26.275, 26.474, 26.316, 26.226, 27.184, 25.903, 25.765, 25.931,
26.057, 26.181, 26.102, 26.436, 26.457, 26.396, 25.585, 25.572,
26.448, 25.637, 25.603, 25.634, 25.847, 26.185, 25.899, 26.016,
25.873, 26.299, 26.358, 26.344, 26.088, 26.264, 26.3, 26.306,
26.311, 25.857, 26.004, 25.824, 25.798, 26.326, 26.03, 25.625,
25.78, 26.368, 26.225, 26.582, 26.398, 25.343, 26.253, 25.908,
25.323, 25.381, 26.3, 26.179, 26.284, 26.024, 25.896, 26.251,
26.447, 26.385, 26.419, 25.188, 25.176, 25.169, 25.348, 25.188,
25.291, 25.285, 25.266, 25.262, 25.333, 25.308, 25.314, 25.145,
25.172, 25.22, 25.235, 25.204, 25.286, 25.155, 25.397, 25.202,
25.373, 25.327, 25.341, 25.172, 25.253, 25.318, 25.023, 25.24,
25.132, 25.264, 25.38, 25.221, 25.119, 25.179, 25.083, 25.258,
25.254, 25.235, 25.252, 25.266, 25.256, 25.264, 25.26, 25.262,
25.265, 25.265, 25.285, 25.28, 25.257, 25.254, 25.258, 25.287,
25.294, 25.282, 25.27, 25.268, 25.309, 25.303, 25.3, 25.312,
25.305, 25.3, 25.314, 25.319, 25.328, 25.304, 25.325, 25.308,
25.332, 25.333, 25.333, 25.346, 25.344, 25.339, 25.355, 25.362,
25.36, 25.391, 25.418, 25.434, 25.436, 25.447, 25.486, 25.5,
25.526, 25.552, 25.551, 25.564, 25.589, 25.606, 25.641, 25.672,
25.689, 25.709, 25.736, 25.758, 25.782, 25.836, 25.844, 25.866,
25.88, 25.935, 25.984, 26.037, 26.066, 26.071, 26.094, 26.106,
26.106, 26.118, 26.1, 26.146, 26.135, 26.156, 26.169, 26.162,
26.173, 26.198, 26.196, 26.228, 26.258, 26.276, 26.283, 26.277,
26.236, 26.277, 26.251, 26.264, 26.26, 26.261, 26.249, 26.307,
26.289, 26.243, 26.206, 26.231, 26.224, 26.238, 26.244, 26.245,
26.254, 26.2, 26.229, 26.24, 26.248, 26.223, 26.29, 26.344, 26.371,
26.364, 26.311, 26.343, 26.342, 26.334, 26.317, 26.342, 26.315,
26.312, 26.322, 26.325, 26.324, 26.32, 26.308, 26.329, 26.31,
26.32, 26.327, 26.34, 26.371, 26.442, 26.442, 26.483, 26.504,
26.526, 26.562, 26.562, 26.538, 26.534, 26.533, 26.541, 26.584,
26.642, 26.65, 26.691, 26.719, 26.755, 26.786, 26.794, 26.849,
26.867, 26.919, 26.93, 26.945, 26.947, 26.959, 26.984, 26.992,
27.006, 27.035, 27.021, 27.052, 27.094, 27.104, 27.119, 27.16,
27.182, 27.223, 27.236, 27.267, 27.304, 27.331, 27.348, 27.341,
27.379, 27.355, 27.378, 27.357, 27.373, 27.319, 27.299, 27.278,
27.28, 27.295, 27.288, 27.286, 27.279), soil_m_sat = c(24.1,
24.2, 26.9, 13.9, 20.6, 34.1, 16.2, 16.7, 16, 22.1, 23.9, 27.2,
26.8, 34.4, 26.3, 54.1, 51, 44.9, 46.4, 45.9, 54.7, 39.1, 38.7,
40.7, 56.5, 56.3, 40.6, 60.9, 56.8, 56.3, 40.7, 40.4, 44.1, 44.9,
46.2, 45.3, 46.1, 43.7, 44.9, 45.4, 33.1, 45.8, 27.6, 47.8, 37.3,
58.9, 51.4, 42.1, 46, 66.6, 51.1, 31.6, 48.7, 32.9, 28.1, 84,
37.7, 38.2, 80.4, 73.3, 35.6, 44.2, 39.7, 50.2, 49.9, 37.8, 37,
41.7, 27.3, 100, 100, 100, 80.9, 100, 88.4, 89.6, 93.8, 95.3,
91.9, 93.9, 96.1, 91.4, 100, 94.4, 100, 100, 80, 94.1, 84.4,
91.1, 80, 78.9, 85.9, 100, 97.5, 87.2, 88.6, 83.3, 90.7, 100,
82.2, 100, 96.3, 93.3, 99.6, 92.1, 92.8, 90.9, 92.3, 91.2, 94.5,
91.8, 89.4, 87, 86, 88, 83.7, 88.8, 92.9, 89.3, 83.3, 83.5, 84.5,
85.8, 87.4, 86.5, 82, 78.1, 85.8, 85.6, 88.7, 87.7, 84.9, 82,
87.9, 85.5, 86, 82, 83, 88.5, 81.2, 81.6, 76.5, 77.6, 84.5, 81.5,
82, 82.4, 68, 67.7, 62.1, 68.9, 61.7, 68.5, 68.6, 65.3, 59.5,
60.8, 67.3, 66.2, 59.9, 50.9, 46.9, 44.6, 47.9, 53, 52.1, 48.3,
41.3, 53.8, 51, 47, 53.7, 49.5, 51.1, 44.4, 35.1, 42.2, 41.5,
40, 48.2, 46.7, 48.6, 51.7, 51.2, 52.3, 53.4, 48.9, 50.7, 48.5,
46.5, 39.4, 38, 49.2, 43.6, 47.1, 40.4, 44.7, 45.7, 38.1, 41.9,
39.3, 40.2, 43.8, 47.3, 50.1, 41.2, 39.8, 46, 40.8, 40, 37.8,
42.6, 46, 43.8, 45.4, 42.2, 46.5, 40.4, 39.9, 53, 44.7, 35.8,
42.9, 43.9, 43.2, 40.6, 40.8, 32.2, 32.6, 33.5, 36.7, 34.6, 34.7,
50.9, 35.6, 34.2, 28.1, 42, 32, 42.3, 30, 29.6, 31, 29.8, 26,
37.8, 40, 37, 30.2, 28.2, 26.2, 27.4, 22.1, 28.4, 23.2, 24.8,
26.5, 23.9, 21.1, 27.2, 20.8, 12.5, 14, 17.9, 19.7, 19.4, 26,
16.7, 18.2, 23.9, 19, 25.9, 24.4, 22.1, 19.2, 18.4, 24.7, 17.3,
19.4, 19.6, 17.7, 21.3, 22.1, 17.9, 28.2, 16.3, 25.3, 19.7, 21.7,
19, 18.8, 11.8, 15.6, 9.8, 17.7)), .Names = c("x", "y", "z",
"soil_m_sat"), class = "data.frame", row.names = c(NA, -296L))
In order to estimate a variogram for this data I need to remove the spatial trend from it. The soil moisture, of course, varies with the surface - the higher a point is the dryer it is. And since this soil moisture data is percetagewise the relationship is hardly linear, what leads me to allow up to cubic dependencies of the soil moisture to the z-coordinate. It happens that in this area there is a small more or less elliptic elevation, so that I want to allow the soil moisture to be dependend of the x- and y-coordinates in a quadratic way. I hope the following model does exactly this:
polymod <- lm(soil_m_sat ~ poly(x + y, degree = 2) + poly(z, degree = 3), data = gue)
summary(polymod)
The summary shows me that there is no significance for the first coefficient of the x- and y-dependency (what summary names poly(x + y, degree = 2)1). Because the help page from poly() told me that it "returns or evaluates orthogonal polynomials of degree 1 to degree", I thought, removing a degree one polynom from the model might be the same as removing the first coefficient of the degree 2 polynom. Therefore I tried to remove it like this:
mod <- lm(soil_m_sat ~ poly(x + y, degree = 2) - poly(x + y, degree = 1) + poly(z, degree = 3), data = gue)
summary(mod)
But the summary of mod looks exactly the same as the summary of polymod, meaning mod does not differ from polymod. How is it possible to remove the unsignificant component then?
No, don't check with summary in this case. You should use anova. A polynomial term from poly(), or a spline term from bs() contains more than coefficients, so they are more like a factor variable with multiple levels.
> anova(polymod)
Analysis of Variance Table
Response: soil_m_sat
Df Sum Sq Mean Sq F value Pr(>F)
poly(x + y, degree = 2) 2 113484 56742 1600.8 < 2.2e-16 ***
poly(z, degree = 3) 3 68538 22846 644.5 < 2.2e-16 ***
Residuals 290 10280 35
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The ANOVA table clearly shows that you need all model terms. Do not drop any.
But I still need to answer your question and make you feel happy.
It is not impossible to drop the poly(x + y, degree = 2)1 term, but you need to access model matrix for such purpose. You may do
gue$XY_poly <- with(gue, poly(x + y, degree = 2))[, 2] ## use the 2nd column only
fit <- lm(soil_m_sat ~ XY_poly + poly(z, degree = 3), data = gue)
summary(fit)
## ...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.3071 0.3459 151.217 < 2e-16 ***
XY_poly -18.8515 7.3894 -2.551 0.0112 *
poly(z, degree = 3)1 -418.1634 6.4937 -64.395 < 2e-16 ***
poly(z, degree = 3)2 116.5327 6.9171 16.847 < 2e-16 ***
poly(z, degree = 3)3 -28.7773 5.9517 -4.835 2.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.951 on 291 degrees of freedom
Multiple R-squared: 0.9464, Adjusted R-squared: 0.9457
F-statistic: 1285 on 4 and 291 DF, p-value: < 2.2e-16

Resources