find Optimal degree of polynomial using anova test in R - r

data=structure(list(Y = c(31.2, 33.3, 35.6, 36.4, 36.7, 38.4, 40.4,
40.3, 41.8, 40.4, 40.7, 40.1, 42.7, 44.1, 46.7, 50.6, 50.1, 51.7,
52.9), X1 = c(492.9, 528.6, 560.3, 624.6, 666.4, 717.8, 768.2,
843.3, 911.6, 931.1, 1021.5, 1165.9, 1349.6, 1449.4, 1575.5,
1759.1, 1994.2, 2258.1, 2478.7), X2 = c(37.3, 38.1, 39.3, 37.8,
38.4, 40.1, 38.6, 39.8, 39.7, 52.1, 48.9, 58.3, 57.9, 56.5, 63.7,
61.6, 58.9, 66.4, 70.4), X3 = c(54.7, 63.7, 69.8, 65.9, 64.5,
70, 73.2, 67.8, 79.1, 95.4, 94.2, 123.5, 129.9, 117.6, 130.9,
129.8, 128, 141, 168.2), X4 = c(77.4, 80.2, 80.4, 83.9, 85.5,
93.7, 106.1, 104.8, 114, 124.1, 127.6, 142.9, 143.6, 139.2, 165.5,
203.3, 219.6, 221.6, 232.6)), class = "data.frame", row.names = c(NA,
-19L))
I want Build several polynomial regressions using polynomials of varying degrees. But the question how to choose the optimal degree of polynomial for modeling this dependence using the ANOVA test?
I use such way
fit1 <- lm(Y ~ X1 + I(X1^2)+X2 + I(X2^3)+X3 + I(X3^2)+X4 + I(X4^2), data=data)
summary(fit1)
AIC(fit1)
but how using loops define optimal degree of polynomial.
Optimal degree of polynomial Is it in which the smallest AIC and largest F-value.

Using stepwise regression and the AIC criterion:
fm <- lm(Y ~ .^3, data)
step(fm)
giving:
Call:
lm(formula = Y ~ X1 + X2 + X3 + X4 + X1:X2 + X1:X3 + X1:X4 +
X2:X3 + X2:X4 + X3:X4 + X1:X2:X4 + X2:X3:X4, data = data)
Coefficients:
(Intercept) X1 X2 X3 X4 X1:X2
5.204e+00 1.388e-02 -2.004e+00 2.701e+00 -1.757e-01 -2.628e-03
X1:X3 X1:X4 X2:X3 X2:X4 X3:X4 X1:X2:X4
1.246e-03 7.076e-04 -2.836e-02 4.470e-02 -3.541e-02 -1.244e-05
X2:X3:X4
2.782e-04
There is also stepAIC in the MASS package.
A different approach is lasso regression.
library(glmnet)
X <- model.matrix(fm)
fm2 <- cv.glmnet(X, data$Y)
coef(fm2, s = fm2$lambda.min)
giving:
(Intercept) 2.829759e+01
(Intercept) .
X1 9.319929e-03
X2 -8.658000e-02
X3 .
X4 6.620549e-02
X1:X2 .
X1:X3 .
X1:X4 .
X2:X3 .
X2:X4 .
X3:X4 .
X1:X2:X3 .
X1:X2:X4 .
X1:X3:X4 -8.318340e-08
X2:X3:X4 .
Check out this link for a variety of approaches https://rpubs.com/McCloud77/300677

Related

Unexpected result while using lowess to smooth a data.table column in R

I have a data.table test_dt in which I want to smooth the y column using lowess function.
test_dt <- structure(list(x = c(28.75, 30, 31.25, 32.5, 33.75, 35, 36.25,
37.5, 38.75, 40, 41.25, 42.5, 43.75, 45, 46.25, 47.5, 48.75,
50, 52.5, 55, 57.5, 60, 62.5, 63.75, 65, 67.5, 70, 72.5, 75,
77.5, 80, 82.5, 85, 87.5, 90, 92.5, 95, 97.5, 100, 102.5, 103.75,
105, 106.25, 107.5, 108.75, 110, 111.25, 112.5, 113.75, 115,
116.25, 117.5, 118.75, 120, 121.25, 122.5, 125, 130, 135, 140,
145), y = c(116.78, 115.53, 114.28, 113.05, 111.78, 110.53, 109.28,
108.05, 106.78, 105.53, 104.28, 103.025, 101.775, 100.525, 99.28,
98.05, 96.8, 95.525, 93.1, 90.65, 88.225, 85.775, 83.35, 82.15,
80.9, 78.5, 76.075, 73.675, 71.25, 68.85, 66.5, 64.075, 61.725,
59.4, 57.075, 54.725, 52.475, 50.225, 48, 45.75, 44.65, 43.55,
42.475, 41.45, 40.35, 39.275, 38.25, 37.225, 36.175, 35.175,
34.175, 33.225, 32.275, 31.3, 30.35, 29.45, 27.625, 24.175, 21,
18.125, 15.55), z = c(116.778248424972, 115.531456655985, 114.284502467544,
113.034850770519, 111.784500981402, 110.533319511795, 109.284500954429,
108.034850457264, 106.784502297216, 105.531265565238, 104.278221015846,
103.026780249377, 101.775992395759, 100.528761292272, 99.2853168637851,
98.043586202838, 96.8021989104315, 95.5702032427799, 93.1041279347743,
90.6575956222915, 88.2179393348852, 85.783500434839, 83.3503011023971,
82.136280706039, 80.922846825298, 78.4965179152157, 76.0823895453039,
73.6686672097464, 71.264486719796, 68.8702598156142, 66.4865368523571,
64.1182523898466, 61.7552221811808, 59.4004347738795, 57.0823289450761,
54.7908645949795, 52.5071096685879, 50.2308279167219, 47.9940967492558,
45.7658417529877, 44.6514226583931, 43.5622751034012, 42.4876666190815,
41.4173110074806, 40.3555584369672, 39.3004471381618, 38.2552969838653,
37.2202353638959, 36.1963659189447, 35.1889616530209, 34.2004259883859,
33.2295174626826, 32.2669278456991, 31.3171387914754, 30.3742375589802,
29.4555719783757, 27.6243725086786, 23.9784367995753, 27.625,
27.625, 27.625)), row.names = c(NA, -61L), class = c("data.table",
"data.frame"))
As can be seen in the image below, I am getting an unexpected result. The expected result is that the line (z column) in the graph below should closely follow the points (y column).
Here is my code -
library(data.table)
library(ggplot2)
test_dt[, z := lowess(x = x, y = y, f = 0.1)$y]
ggplot(test_dt) + geom_point(aes(x, y)) + geom_line(aes(x, z))
Q1. Can someone suggest why lowess is not smoothing properly?
Q2. Since lowess is not working as expected, is there any other function in R that would be more efficient in smoothing the y column without producing a spike (as lowess did on the boundary points)?
You could use loess instead:
test_dt[, z := predict(loess(y ~ x, data = test_dt))]
ggplot(test_dt) + geom_point(aes(x, y)) + geom_line(aes(x, z))
Note though, that if all you want to do is plot the line, this is exactly the method that geom_smooth uses, so without even creating a z column, you could do:
ggplot(test_dt, aes(x, y)) + geom_point() + geom_smooth()
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2021-11-07 by the reprex package (v2.0.0)
The problem got solved by keeping the number of iterations to zero.lowess acts like loess when iterations are kept at zero.
test_dt[, z := lowess(x = x, y = y, f = 0.1, iter=0)$y]

Overlaying a histogram with normal distribution

I neeed to draw a histogram of data (weights) overlayed with a line of the expected normal distributioh
I am totally new to R and statistics. I know I probably got something fundamentally wrong about frequencies density and dnorm, but I am stuck.
weights <- c(97.6,95,94.3 ,92.3 ,90.7 ,89.4 ,88.2 ,86.9 ,85.8 ,85.5 ,84.4 ,84.1 ,82.5 ,81.4 ,80.8 ,80 ,79.8 ,79.5 ,78.4 ,78.4 ,78.2 ,78.1 ,78 ,77.4 ,76.5 ,75.4 ,74.8 ,74.1 ,73.5 ,73.2 ,73 ,72.3 ,72.3 ,72.2 ,71.8 ,71.7 ,71.6 ,71.6 ,71.5 ,71.3 ,70.7 ,70.6 ,70.5 ,69.2 ,68.6 ,68.3 ,67.5 ,67 ,66.8 ,66.6 ,65.8 ,65.6 ,64.9 ,64.6 ,64.5 ,64.5 ,64.3 ,64.2 ,63.9 ,63.7 ,62.7 ,62.3 ,62.2 ,59.4 ,57.8 ,57.8 ,57.6 ,56.4 ,53.6 ,53.2 )
hist(weights)
m <- mean(weights)
sd <- sd(weights)
x <- seq(min(weights), max(weights), length.out length(weights))
xn <- dnorm(x, mean = m, sd = sd) * length(weights) #what is the correct factor???
lines(x, xn)
I expected the line to follow the histogram approximately, but it is too low in the histogram
what you need is to plot the histogram with the frequency of the examples and then plot the density of the weights, i.e.
weights = c(97.6,95,94.3 ,92.3 ,90.7 ,89.4 ,88.2 ,86.9 ,85.8 ,85.5 ,84.4 ,84.1 ,82.5 ,81.4 ,80.8 ,80 ,79.8 ,79.5 ,78.4 ,78.4 ,78.2 ,78.1 ,78 ,77.4 ,76.5 ,75.4 ,74.8 ,74.1 ,73.5 ,73.2 ,73 ,72.3 ,72.3 ,72.2 ,71.8 ,71.7 ,71.6 ,71.6 ,71.5 ,71.3 ,70.7 ,70.6 ,70.5 ,69.2 ,68.6 ,68.3 ,67.5 ,67 ,66.8 ,66.6 ,65.8 ,65.6 ,64.9 ,64.6 ,64.5 ,64.5 ,64.3 ,64.2 ,63.9 ,63.7 ,62.7 ,62.3 ,62.2 ,59.4 ,57.8 ,57.8 ,57.6 ,56.4 ,53.6 ,53.2 )
hist(weights, prob = T)
lines(density(weights), col = "red")
Hope this helps.
The problem in your code is that hist plots frequencies and dnorm calculates densities.
You can try making a histogram with densities and then you will see the histogram or the line just adding freq=F to the histogram:
hist(weights, freq = F)
You're nearly there, you just have to factor in the histogram bin widths.
weights <- c(97.6, 95, 94.3, 92.3, 90.7, 89.4, 88.2, 86.9, 85.8,
85.5, 84.4, 84.1, 82.5, 81.4, 80.8, 80, 79.8, 79.5, 78.4, 78.4,
78.2, 78.1, 78, 77.4, 76.5, 75.4, 74.8, 74.1, 73.5, 73.2, 73,
72.3, 72.3, 72.2, 71.8, 71.7, 71.6, 71.6, 71.5, 71.3, 70.7,
70.6, 70.5, 69.2, 68.6, 68.3, 67.5, 67, 66.8, 66.6, 65.8, 65.6,
64.9, 64.6, 64.5, 64.5, 64.3, 64.2, 63.9, 63.7, 62.7, 62.3,
62.2, 59.4, 57.8, 57.8, 57.6, 56.4, 53.6, 53.2)
h <- hist(weights, freq=TRUE)
binwi <- diff(h$breaks)[1]
x <- seq(min(weights)-10, max(weights)+10, 0.01)
xn <- dnorm(x, mean=mean(weights), sd=sd(weights)) * length(weights) * binwi
lines(x, xn)

How can I get standard errors for my 4 parameters when the Hessian matrix from solnp is 5 by 5?

I'm using the solnp() function in the R package Rsolnp to solve a nonlinear regression with constraints. It works well, converges with no problem. I want to use the Hessian matrix to calculate standard errors of the four parameter estimates, but the Hessian is not 4 by 4 as I had expected, but 5 by 5. I looked around on SO and didn't see anyone else with an unexpected Hessian size. All the examples I found with the Hessians printed showed them to be the expected size of p by p (e.g., 2x2, 3x3, and 4x4).
How can I get standard errors for my 4 parameters from this 5 by 5 Hessian?
df <- data.frame(
Recruit.N = c(78.4, 79.8, 106, 57.4, 81.7, 94.4, 74.1, 42, 61.6, 47.7, 61.8,
28.1, 32.3, 19, 23.4, 20.1, 27),
Stock.5 = c(66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4, 81.7, 94.4,
74.1, 42, 61.6, 47.7, 61.8, 28.1),
Stock.6 = c(25.2, 66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4, 81.7,
94.4, 74.1, 42, 61.6, 47.7, 61.8),
Stock.7 = c(23.8, 25.2, 66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4,
81.7, 94.4, 74.1, 42, 61.6, 47.7)
)
lossfcn <- function(parz, mydat) {
alpha <- parz[[1]]
beta <- parz[[2]]
p5 <- parz[[3]]
p6 <- parz[[4]]
p7 <- 1 - p5 - p6
S <- with(mydat, p5*Stock.5 + p6*Stock.6 + p7*Stock.7)
Obs <- mydat$Recruit.N
Pred <- alpha * S * exp(-beta*S)
Resid <- log(Obs) - log(Pred)
sigma <- sqrt(mean(Resid^2))
LL <- dlnorm(Obs, meanlog=log(Pred), sdlog=sigma, log=TRUE)
-sum(LL)
}
inequal <- function(parz, mydat) {
parz[3] + parz[4]
}
library(Rsolnp)
solnp(pars=c(1, 0.008, 1/3, 1/3), fun=lossfcn, mydat=df,
ineqfun=inequal, ineqLB=0, ineqUB=1,
LB=c(0, 0, 0, 0), UB=c(1000, 1000, 1, 1), control=list(trace=0))
$pars
[1] 6.731317e-01 1.888572e-10 8.141363e-01 1.858631e-01
$convergence
[1] 0
$values
[1] 79.87150 75.50927 75.50927 75.50927
$lagrange
[,1]
[1,] -2.028222
$hessian
[,1] [,2] [,3] [,4] [,5]
[1,] 0.3350868 -3.359077e-01 17.84919 -0.4306057 -0.3382811
[2,] -0.3359077 1.993956e+02 -10161.63351 -7.0844295 -2.2749785
[3,] 17.8491854 -1.016163e+04 548099.69224 -85.9544831 -224.0362766
[4,] -0.4306057 -7.084429e+00 -85.95448 25.1086694 5.8817704
[5,] -0.3382811 -2.274979e+00 -224.03628 5.8817704 4.1978178
$ineqx0
[1] 0.9999995
$nfuneval
[1] 142
$outer.iter
[1] 3
$elapsed
Time difference of 0.03016496 secs
$vscale
[1] 1 1 1 1 1 1
Unlike the 3 posts you linked, you have an inequality constraint. Check the ineqx0 in the returned values: the other posts have NULL but you have 0.9999995. With an inequality constraint, there is a slack variable, so the problem is augmented. The hessian matrix returned is for this augmented set of parameters. Just retain the first 4 x 4 submatrix of the hessian for your wanted parameters.

R: How to or should I drop an insignificant orthogonal polynomial basis in a linear model?

I have soil moisture data with x-, y- and z-coordinates like this:
gue <- structure(list(x = c(311939.1507, 311935.4607, 311924.7316, 311959.553,
311973.5368, 311953.3743, 311957.9409, 311948.3151, 311946.7169,
311997.0803, 312017.5236, 312006.0245, 312001.5179, 311992.7044,
311977.3076, 311960.4159, 311970.6047, 311957.2564, 311866.4246,
311870.8714, 311861.4461, 311928.7096, 311929.6291, 311929.4233,
311891.2915, 311890.3429, 311900.8905, 311864.4995, 311870.8143,
311866.9257, 312002.571, 312017.816, 312004.5024, 311947.1186,
311943.0152, 311952.2695, 311920.6095, 311929.8371, 311918.6095,
312011.9019, 311999.5755, 312011.1461, 311913.7251, 311925.3459,
311944.4701, 311910.2079, 311908.7618, 311896.0776, 311864.4814,
311856.9027, 311857.5747, 311967.3779, 311962.2024, 311956.8318,
311977.5254, 311971.1776, 311982.537, 311993.4709, 312004.6407,
312015.6118, 311990.8601, 311994.686, 311988.3037, 311990.518,
311986.3918, 311998.8876, 311923.9157, 311903.4563, 311915.714,
311856.9087, 311858.9812, 311874.5867, 311963.9099, 311938.4542,
311945.9505, 311804.3039, 311797.2571, 311791.6967, 311921.3965,
311928.9353, 311920.0597, 311833.5109, 311829.8683, 311847.6261,
311889.1243, 311902.4909, 311901.245, 311981.1118, 312005.7098,
311976.5858, 311819.8901, 311816.4143, 311819.4172, 311870.418,
311873.2656, 311888.3401, 311910.8377, 311897.6697, 311902.4571,
311846.8196, 311833.6235, 311846.2942, 311931.3916, 311930.1891,
311947.659, 311792.2642, 311793.2539, 311794.1931, 311795.1288,
311796.0806, 311797.0142, 311797.95, 311798.8822, 311799.8229,
311800.7774, 311801.7094, 311802.6395, 311803.583, 311804.5185,
311805.4558, 311806.391, 311807.3346, 311808.2757, 311809.2187,
311810.1549, 311811.1014, 311812.0366, 311812.9667, 311813.9107,
311814.8373, 311815.7777, 311816.7365, 311817.6522, 311818.6091,
311819.5335, 311820.4961, 311821.4337, 311822.3855, 311823.3195,
311824.2713, 311825.214, 311826.1705, 311827.1188, 311828.0501,
311828.9893, 311829.9324, 311830.8706, 311831.8181, 311832.7667,
311833.705, 311834.6546, 311835.609, 311836.5527, 311837.5157,
311838.4495, 311839.3926, 311840.3423, 311841.2799, 311842.2288,
311843.1691, 311844.118, 311845.0746, 311846.019, 311846.9709,
311847.9201, 311848.859, 311849.8105, 311850.7503, 311851.6889,
311852.6355, 311853.6045, 311854.5296, 311855.4717, 311856.4171,
311857.3759, 311858.3151, 311859.2604, 311860.2178, 311861.1636,
311862.1071, 311863.0347, 311863.9857, 311864.9316, 311865.8722,
311866.8158, 311867.7702, 311868.7155, 311869.649, 311870.6018,
311871.5449, 311872.4871, 311873.4352, 311874.385, 311875.3042,
311876.2617, 311877.2068, 311878.1429, 311879.0956, 311880.0401,
311880.9822, 311881.929, 311882.8651, 311883.8017, 311884.7429,
311885.6949, 311886.6349, 311887.7207, 311888.6653, 311889.6041,
311890.5358, 311891.4838, 311892.4292, 311893.3736, 311894.326,
311895.2703, 311896.2182, 311897.1635, 311898.1032, 311899.0496,
311899.9967, 311900.9456, 311901.8889, 311902.8162, 311903.7566,
311904.6996, 311905.6627, 311906.5899, 311907.5448, 311908.4856,
311909.4399, 311910.3649, 311911.3188, 311912.2629, 311913.2022,
311914.1527, 311915.1025, 311916.0425, 311916.985, 311917.9254,
311918.8661, 311919.8174, 311920.7668, 311921.7026, 311922.6517,
311923.5949, 311924.5252, 311925.4599, 311926.422, 311927.3646,
311928.3, 311929.2432, 311930.1796, 311931.1358, 311932.0726,
311933.0069, 311933.9585, 311934.845, 311935.7788, 311936.7193,
311937.6441, 311938.572, 311939.5094, 311940.4666, 311941.4067,
311942.3489, 311943.2712, 311944.2195, 311945.1536, 311946.0927,
311947.0413, 311947.9761, 311948.9082, 311949.8557, 311950.8201,
311951.7616, 311952.7148, 311953.7894, 311954.7289, 311955.6646,
311956.6081, 311957.5588, 311958.4896, 311959.4297, 311960.3761,
311961.3191, 311962.2653, 311963.195, 311964.1501, 311965.0856,
311966.0254, 311966.9739, 311967.9305, 311968.8592, 311971.7861,
311970.758, 311969.8205), y = c(5846548.408, 5846546.489, 5846538.014,
5846525.283, 5846510.302, 5846503.516, 5846529.769, 5846523.06,
5846522.742, 5846512.263, 5846525.347, 5846522.042, 5846537.487,
5846545.587, 5846532.112, 5846425.917, 5846406.543, 5846434.03,
5846500.989, 5846498.286, 5846487.134, 5846488.045, 5846483.29,
5846468.713, 5846534.269, 5846533.527, 5846504.056, 5846453.395,
5846438.43, 5846442.608, 5846406.8, 5846434.58, 5846419.229,
5846441.045, 5846436.903, 5846447.917, 5846460.757, 5846457.428,
5846451.067, 5846445.596, 5846474.031, 5846457.239, 5846532.694,
5846553.938, 5846565.323, 5846446.926, 5846432.549, 5846467.236,
5846473.963, 5846464.78, 5846498.142, 5846458.168, 5846474.018,
5846489.801, 5846559.513, 5846589.975, 5846555.723, 5846553.847,
5846560.066, 5846560.792, 5846455.642, 5846546.374, 5846465.999,
5846432.091, 5846422.061, 5846442.871, 5846485.956, 5846472.811,
5846506.756, 5846416.327, 5846419.623, 5846413.124, 5846587.334,
5846600.116, 5846589.515, 5846463.69, 5846456.712, 5846459.683,
5846600.118, 5846574.99, 5846597.804, 5846419.496, 5846437.615,
5846436.902, 5846567.872, 5846572.857, 5846556.904, 5846388.146,
5846393.088, 5846390.13, 5846481.09, 5846496.127, 5846493.586,
5846545.396, 5846532.126, 5846538.334, 5846388.343, 5846416.117,
5846392.223, 5846513.526, 5846486.644, 5846512.917, 5846395.509,
5846386.421, 5846383.873, 5846459.062, 5846459.36, 5846459.682,
5846460.026, 5846460.377, 5846460.703, 5846461.047, 5846461.378,
5846461.73, 5846462.071, 5846462.418, 5846462.765, 5846463.115,
5846463.466, 5846463.815, 5846464.128, 5846464.505, 5846464.843,
5846465.189, 5846465.52, 5846465.869, 5846466.217, 5846466.557,
5846466.893, 5846467.237, 5846467.586, 5846467.903, 5846468.274,
5846468.601, 5846468.943, 5846469.258, 5846469.592, 5846469.909,
5846470.247, 5846470.565, 5846470.891, 5846471.24, 5846471.536,
5846471.885, 5846472.224, 5846472.553, 5846472.884, 5846473.225,
5846473.532, 5846473.89, 5846474.179, 5846474.502, 5846474.827,
5846475.146, 5846475.448, 5846475.768, 5846476.102, 5846476.428,
5846476.746, 5846477.069, 5846477.37, 5846477.685, 5846478.009,
5846478.335, 5846478.656, 5846478.958, 5846479.299, 5846479.608,
5846479.926, 5846480.267, 5846480.603, 5846480.908, 5846481.246,
5846481.56, 5846481.877, 5846482.19, 5846482.503, 5846482.825,
5846483.144, 5846483.468, 5846483.811, 5846484.13, 5846484.458,
5846484.8, 5846485.125, 5846485.456, 5846485.778, 5846486.112,
5846486.421, 5846486.75, 5846487.08, 5846487.401, 5846487.744,
5846488.067, 5846488.39, 5846488.728, 5846489.067, 5846489.383,
5846489.716, 5846490.054, 5846490.38, 5846490.719, 5846491.044,
5846491.357, 5846491.694, 5846492.005, 5846492.402, 5846492.726,
5846493.045, 5846493.389, 5846493.708, 5846494.049, 5846494.363,
5846494.686, 5846494.982, 5846495.3, 5846495.64, 5846495.957,
5846496.263, 5846496.584, 5846496.911, 5846497.241, 5846497.591,
5846497.914, 5846498.226, 5846498.553, 5846498.893, 5846499.221,
5846499.538, 5846499.869, 5846500.19, 5846500.508, 5846500.82,
5846501.151, 5846501.492, 5846501.827, 5846502.147, 5846502.471,
5846502.803, 5846503.129, 5846503.46, 5846503.783, 5846504.11,
5846504.448, 5846504.76, 5846505.118, 5846505.445, 5846505.79,
5846506.106, 5846506.465, 5846506.795, 5846507.118, 5846507.448,
5846507.758, 5846508.081, 5846508.396, 5846508.645, 5846508.99,
5846509.34, 5846509.685, 5846510.031, 5846510.363, 5846510.693,
5846511.031, 5846511.362, 5846511.694, 5846512.024, 5846512.354,
5846512.701, 5846513.034, 5846513.353, 5846513.683, 5846513.998,
5846514.32, 5846514.636, 5846514.956, 5846515.326, 5846515.65,
5846515.968, 5846516.301, 5846516.634, 5846516.971, 5846517.318,
5846517.64, 5846517.952, 5846518.308, 5846518.626, 5846518.937,
5846519.27, 5846519.597, 5846519.921, 5846520.245, 5846520.581,
5846521.498, 5846521.209, 5846520.893), z = c(26.485, 26.411,
26.339, 27.248, 27.208, 26.799, 27.199, 27.023, 26.973, 26.908,
26.275, 26.474, 26.316, 26.226, 27.184, 25.903, 25.765, 25.931,
26.057, 26.181, 26.102, 26.436, 26.457, 26.396, 25.585, 25.572,
26.448, 25.637, 25.603, 25.634, 25.847, 26.185, 25.899, 26.016,
25.873, 26.299, 26.358, 26.344, 26.088, 26.264, 26.3, 26.306,
26.311, 25.857, 26.004, 25.824, 25.798, 26.326, 26.03, 25.625,
25.78, 26.368, 26.225, 26.582, 26.398, 25.343, 26.253, 25.908,
25.323, 25.381, 26.3, 26.179, 26.284, 26.024, 25.896, 26.251,
26.447, 26.385, 26.419, 25.188, 25.176, 25.169, 25.348, 25.188,
25.291, 25.285, 25.266, 25.262, 25.333, 25.308, 25.314, 25.145,
25.172, 25.22, 25.235, 25.204, 25.286, 25.155, 25.397, 25.202,
25.373, 25.327, 25.341, 25.172, 25.253, 25.318, 25.023, 25.24,
25.132, 25.264, 25.38, 25.221, 25.119, 25.179, 25.083, 25.258,
25.254, 25.235, 25.252, 25.266, 25.256, 25.264, 25.26, 25.262,
25.265, 25.265, 25.285, 25.28, 25.257, 25.254, 25.258, 25.287,
25.294, 25.282, 25.27, 25.268, 25.309, 25.303, 25.3, 25.312,
25.305, 25.3, 25.314, 25.319, 25.328, 25.304, 25.325, 25.308,
25.332, 25.333, 25.333, 25.346, 25.344, 25.339, 25.355, 25.362,
25.36, 25.391, 25.418, 25.434, 25.436, 25.447, 25.486, 25.5,
25.526, 25.552, 25.551, 25.564, 25.589, 25.606, 25.641, 25.672,
25.689, 25.709, 25.736, 25.758, 25.782, 25.836, 25.844, 25.866,
25.88, 25.935, 25.984, 26.037, 26.066, 26.071, 26.094, 26.106,
26.106, 26.118, 26.1, 26.146, 26.135, 26.156, 26.169, 26.162,
26.173, 26.198, 26.196, 26.228, 26.258, 26.276, 26.283, 26.277,
26.236, 26.277, 26.251, 26.264, 26.26, 26.261, 26.249, 26.307,
26.289, 26.243, 26.206, 26.231, 26.224, 26.238, 26.244, 26.245,
26.254, 26.2, 26.229, 26.24, 26.248, 26.223, 26.29, 26.344, 26.371,
26.364, 26.311, 26.343, 26.342, 26.334, 26.317, 26.342, 26.315,
26.312, 26.322, 26.325, 26.324, 26.32, 26.308, 26.329, 26.31,
26.32, 26.327, 26.34, 26.371, 26.442, 26.442, 26.483, 26.504,
26.526, 26.562, 26.562, 26.538, 26.534, 26.533, 26.541, 26.584,
26.642, 26.65, 26.691, 26.719, 26.755, 26.786, 26.794, 26.849,
26.867, 26.919, 26.93, 26.945, 26.947, 26.959, 26.984, 26.992,
27.006, 27.035, 27.021, 27.052, 27.094, 27.104, 27.119, 27.16,
27.182, 27.223, 27.236, 27.267, 27.304, 27.331, 27.348, 27.341,
27.379, 27.355, 27.378, 27.357, 27.373, 27.319, 27.299, 27.278,
27.28, 27.295, 27.288, 27.286, 27.279), soil_m_sat = c(24.1,
24.2, 26.9, 13.9, 20.6, 34.1, 16.2, 16.7, 16, 22.1, 23.9, 27.2,
26.8, 34.4, 26.3, 54.1, 51, 44.9, 46.4, 45.9, 54.7, 39.1, 38.7,
40.7, 56.5, 56.3, 40.6, 60.9, 56.8, 56.3, 40.7, 40.4, 44.1, 44.9,
46.2, 45.3, 46.1, 43.7, 44.9, 45.4, 33.1, 45.8, 27.6, 47.8, 37.3,
58.9, 51.4, 42.1, 46, 66.6, 51.1, 31.6, 48.7, 32.9, 28.1, 84,
37.7, 38.2, 80.4, 73.3, 35.6, 44.2, 39.7, 50.2, 49.9, 37.8, 37,
41.7, 27.3, 100, 100, 100, 80.9, 100, 88.4, 89.6, 93.8, 95.3,
91.9, 93.9, 96.1, 91.4, 100, 94.4, 100, 100, 80, 94.1, 84.4,
91.1, 80, 78.9, 85.9, 100, 97.5, 87.2, 88.6, 83.3, 90.7, 100,
82.2, 100, 96.3, 93.3, 99.6, 92.1, 92.8, 90.9, 92.3, 91.2, 94.5,
91.8, 89.4, 87, 86, 88, 83.7, 88.8, 92.9, 89.3, 83.3, 83.5, 84.5,
85.8, 87.4, 86.5, 82, 78.1, 85.8, 85.6, 88.7, 87.7, 84.9, 82,
87.9, 85.5, 86, 82, 83, 88.5, 81.2, 81.6, 76.5, 77.6, 84.5, 81.5,
82, 82.4, 68, 67.7, 62.1, 68.9, 61.7, 68.5, 68.6, 65.3, 59.5,
60.8, 67.3, 66.2, 59.9, 50.9, 46.9, 44.6, 47.9, 53, 52.1, 48.3,
41.3, 53.8, 51, 47, 53.7, 49.5, 51.1, 44.4, 35.1, 42.2, 41.5,
40, 48.2, 46.7, 48.6, 51.7, 51.2, 52.3, 53.4, 48.9, 50.7, 48.5,
46.5, 39.4, 38, 49.2, 43.6, 47.1, 40.4, 44.7, 45.7, 38.1, 41.9,
39.3, 40.2, 43.8, 47.3, 50.1, 41.2, 39.8, 46, 40.8, 40, 37.8,
42.6, 46, 43.8, 45.4, 42.2, 46.5, 40.4, 39.9, 53, 44.7, 35.8,
42.9, 43.9, 43.2, 40.6, 40.8, 32.2, 32.6, 33.5, 36.7, 34.6, 34.7,
50.9, 35.6, 34.2, 28.1, 42, 32, 42.3, 30, 29.6, 31, 29.8, 26,
37.8, 40, 37, 30.2, 28.2, 26.2, 27.4, 22.1, 28.4, 23.2, 24.8,
26.5, 23.9, 21.1, 27.2, 20.8, 12.5, 14, 17.9, 19.7, 19.4, 26,
16.7, 18.2, 23.9, 19, 25.9, 24.4, 22.1, 19.2, 18.4, 24.7, 17.3,
19.4, 19.6, 17.7, 21.3, 22.1, 17.9, 28.2, 16.3, 25.3, 19.7, 21.7,
19, 18.8, 11.8, 15.6, 9.8, 17.7)), .Names = c("x", "y", "z",
"soil_m_sat"), class = "data.frame", row.names = c(NA, -296L))
In order to estimate a variogram for this data I need to remove the spatial trend from it. The soil moisture, of course, varies with the surface - the higher a point is the dryer it is. And since this soil moisture data is percetagewise the relationship is hardly linear, what leads me to allow up to cubic dependencies of the soil moisture to the z-coordinate. It happens that in this area there is a small more or less elliptic elevation, so that I want to allow the soil moisture to be dependend of the x- and y-coordinates in a quadratic way. I hope the following model does exactly this:
polymod <- lm(soil_m_sat ~ poly(x + y, degree = 2) + poly(z, degree = 3), data = gue)
summary(polymod)
The summary shows me that there is no significance for the first coefficient of the x- and y-dependency (what summary names poly(x + y, degree = 2)1). Because the help page from poly() told me that it "returns or evaluates orthogonal polynomials of degree 1 to degree", I thought, removing a degree one polynom from the model might be the same as removing the first coefficient of the degree 2 polynom. Therefore I tried to remove it like this:
mod <- lm(soil_m_sat ~ poly(x + y, degree = 2) - poly(x + y, degree = 1) + poly(z, degree = 3), data = gue)
summary(mod)
But the summary of mod looks exactly the same as the summary of polymod, meaning mod does not differ from polymod. How is it possible to remove the unsignificant component then?
No, don't check with summary in this case. You should use anova. A polynomial term from poly(), or a spline term from bs() contains more than coefficients, so they are more like a factor variable with multiple levels.
> anova(polymod)
Analysis of Variance Table
Response: soil_m_sat
Df Sum Sq Mean Sq F value Pr(>F)
poly(x + y, degree = 2) 2 113484 56742 1600.8 < 2.2e-16 ***
poly(z, degree = 3) 3 68538 22846 644.5 < 2.2e-16 ***
Residuals 290 10280 35
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The ANOVA table clearly shows that you need all model terms. Do not drop any.
But I still need to answer your question and make you feel happy.
It is not impossible to drop the poly(x + y, degree = 2)1 term, but you need to access model matrix for such purpose. You may do
gue$XY_poly <- with(gue, poly(x + y, degree = 2))[, 2] ## use the 2nd column only
fit <- lm(soil_m_sat ~ XY_poly + poly(z, degree = 3), data = gue)
summary(fit)
## ...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.3071 0.3459 151.217 < 2e-16 ***
XY_poly -18.8515 7.3894 -2.551 0.0112 *
poly(z, degree = 3)1 -418.1634 6.4937 -64.395 < 2e-16 ***
poly(z, degree = 3)2 116.5327 6.9171 16.847 < 2e-16 ***
poly(z, degree = 3)3 -28.7773 5.9517 -4.835 2.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.951 on 291 degrees of freedom
Multiple R-squared: 0.9464, Adjusted R-squared: 0.9457
F-statistic: 1285 on 4 and 291 DF, p-value: < 2.2e-16

ks.test with left truncated weibull

The Distribution I accept my values to follow is a left truncated weibull distribution. I do know the parameters a, shape and scale of this distribution for using the ptrunc command:
require(truncdist);
ptrunc(x,"weibull",a=a,scale=b,shape=c)
so I want the ks.test command (see below) to use the described left truncated weibull distribution instead of the "normal weibull".
myvalues<-c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4);
a<-7;
scale<-36.37516;
shape<-9.437013;
So I do know, that in this case it is not necessary to do the left-side truncation. But in others it will be.
ks.test(myvalues,"pweibull",scale=b,shape=c) #for normal weibull
but
ks.test(myvalues,ptrunc(x,"weibull",a=a,scale=b,shape=c)) # for leftruncated
gives a wrong result.
First of all, ptrunc should be replaced by rtrunc. ptrunc gives a vector of probability values. But by the documentation of ks.test we need a sample, and this is what rtrunc gives us. If the argument a of rtrunc is set to -Inf, there is no truncation and the result with a=-Inf is indeed the same as with a=7:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
a <- 7
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myvalues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myvalues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.21429, p-value = 0.2898
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.21429, p-value = 0.2898
alternative hypothesis: two-sided
But still the result of ks.test( myvalues, "pweibull",scale=scale,shape=shape ) is different:
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
The reason is that myvalues is too small. If we make it larger in the call of rtrunc (not ks.test), ks0, ks1, and ks2 are almost the same:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
myManyValues <- c(outer((0:9999)/100000,myvalues,"+"))
a <- 7
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myManyValues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myManyValues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
Now let's see what happens when we do truncate the distribution:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
myManyValues <- c(outer((0:9999)/100000,myvalues,"+"))
a <- 29
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myManyValues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myManyValues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.2059, p-value = 0.05683
alternative hypothesis: two-sided
You are using the ptrunc function incorrectly (I assume), it needs to be fed a sequence of quantiles. Below I calculate the mean and standard deviation of the Weibull based on your scale and shape parameters, then sample from 5 standard deviations above and below to produce a comparison set.
require(truncdist);
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4);
a <- 7;
scale <- 36.37516;
shape <- 9.437013;
# Calculate standard deviation of the weibull
weib_mean <- scale * gamma(1 + 1/shape)
weib_sd <- sqrt((scale^2) * (gamma(1 + 2/shape) - (gamma(1 + 1/shape))^2))
# Get a sample
quant <- seq(weib_mean - 5 * weib_sd, weib_mean + 5 * weib_sd, length.out = 1E5)
weibull_samp <- ptrunc(quant, "weibull", a = a, scale = scale, shape = shape)
# Take a look
plot(weibull_samp ~ quant)
# Use with test
> ks.test(sort(myvalues), weibull_samp)
Two-sample Kolmogorov-Smirnov test
data: sort(myvalues) and weibull_samp
D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided

Resources