ks.test with left truncated weibull - r

The Distribution I accept my values to follow is a left truncated weibull distribution. I do know the parameters a, shape and scale of this distribution for using the ptrunc command:
require(truncdist);
ptrunc(x,"weibull",a=a,scale=b,shape=c)
so I want the ks.test command (see below) to use the described left truncated weibull distribution instead of the "normal weibull".
myvalues<-c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4);
a<-7;
scale<-36.37516;
shape<-9.437013;
So I do know, that in this case it is not necessary to do the left-side truncation. But in others it will be.
ks.test(myvalues,"pweibull",scale=b,shape=c) #for normal weibull
but
ks.test(myvalues,ptrunc(x,"weibull",a=a,scale=b,shape=c)) # for leftruncated
gives a wrong result.

First of all, ptrunc should be replaced by rtrunc. ptrunc gives a vector of probability values. But by the documentation of ks.test we need a sample, and this is what rtrunc gives us. If the argument a of rtrunc is set to -Inf, there is no truncation and the result with a=-Inf is indeed the same as with a=7:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
a <- 7
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myvalues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myvalues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.21429, p-value = 0.2898
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.21429, p-value = 0.2898
alternative hypothesis: two-sided
But still the result of ks.test( myvalues, "pweibull",scale=scale,shape=shape ) is different:
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
The reason is that myvalues is too small. If we make it larger in the call of rtrunc (not ks.test), ks0, ks1, and ks2 are almost the same:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
myManyValues <- c(outer((0:9999)/100000,myvalues,"+"))
a <- 7
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myManyValues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myManyValues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
Now let's see what happens when we do truncate the distribution:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
myManyValues <- c(outer((0:9999)/100000,myvalues,"+"))
a <- 29
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myManyValues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myManyValues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.2059, p-value = 0.05683
alternative hypothesis: two-sided

You are using the ptrunc function incorrectly (I assume), it needs to be fed a sequence of quantiles. Below I calculate the mean and standard deviation of the Weibull based on your scale and shape parameters, then sample from 5 standard deviations above and below to produce a comparison set.
require(truncdist);
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4);
a <- 7;
scale <- 36.37516;
shape <- 9.437013;
# Calculate standard deviation of the weibull
weib_mean <- scale * gamma(1 + 1/shape)
weib_sd <- sqrt((scale^2) * (gamma(1 + 2/shape) - (gamma(1 + 1/shape))^2))
# Get a sample
quant <- seq(weib_mean - 5 * weib_sd, weib_mean + 5 * weib_sd, length.out = 1E5)
weibull_samp <- ptrunc(quant, "weibull", a = a, scale = scale, shape = shape)
# Take a look
plot(weibull_samp ~ quant)
# Use with test
> ks.test(sort(myvalues), weibull_samp)
Two-sample Kolmogorov-Smirnov test
data: sort(myvalues) and weibull_samp
D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided

Related

Solving a linear model for a known value of y in R

I have a series of x and y values that I've used to build a linear model.
I can use predict() to find a value of y from a known value of x, but I'm struggling to calculate x from a known value of y. I've seen a few posts that talk about using the approx() function, but I can't figure out how to implement it for my use case. The idea is to write a function that takes a numerical value of y as an input and returns the expected value of x that it would correspond to, ideally with a prediction interval, eg "The expected value of x is 38.90, plus or minus 0.7", or something like that.
Here's my data:
> dput(x)
c(4.66, 5.53, 5.62, 5.85, 6.26, 6.91, 7.04, 7.32, 7.43, 7.85,
8.1, 8.3, 8.34, 8.53, 8.69, 8.7, 8.73, 8.76, 8.96, 9.06, 9.42,
9.78, 10.3, 10.82, 10.98, 11.07, 11.09, 11.32, 11.75, 12.1, 12.46,
12.5, 12.99, 13.02, 13.28, 13.43, 13.96, 14, 14.07, 14.29, 14.57,
14.66, 15.21, 15.56, 15.97, 16.44, 16.8, 17.95, 18.33, 18.62,
18.92, 19.49, 19.9, 19.92, 20.14, 20.18, 21.19, 22.7, 23.25,
23.48, 23.49, 23.58, 23.7, 23.83, 23.83, 23.97, 24.05, 24.14,
24.15, 24.19, 24.32, 24.62, 24.9, 24.92, 25, 25.06, 25.31, 25.36,
25.86, 25.9, 25.95, 25.99, 26.08, 26.2, 26.27, 26.39, 26.5, 26.51,
26.68, 26.78, 26.82, 26.92, 26.92, 27.05, 27.05, 27.07, 27.32,
27.6, 27.77, 27.8, 27.91, 27.96, 27.97, 28.04, 28.05, 28.15,
28.2, 28.28, 28.37, 28.51, 28.53, 28.53, 28.66, 28.68, 28.72,
28.74, 28.82, 28.83, 28.83, 28.86, 28.89, 28.91, 29.04, 29.2,
29.35, 29.4, 29.42, 29.48, 29.53, 29.65, 29.67, 29.69, 29.7,
29.72, 29.93, 29.97, 30.03, 30.08, 30.09, 30.11, 30.18, 30.62,
30.66, 30.78, 31, 31.32, 31.43, 31.47, 31.69, 31.96, 32.33, 32.5,
32.5, 32.58, 32.7, 32.92, 33.2, 33.6, 33.72, 33.77, 33.95, 34.02,
34.08, 34.42, 34.79, 34.91, 34.99, 35.08, 35.15, 35.49, 35.6,
35.6, 35.74, 35.8, 36.05, 36.17, 36.3, 36.37, 36.84, 37.31, 37.95,
38.75, 38.78, 38.81, 38.9, 39.21, 39.31, 39.5, 42.68, 43.92,
43.95, 44.64, 45.7, 45.95, 46.25, 46.8, 49.08, 50.33, 51.23,
52.76, 53.06, 62)
> dput(y)
c(11.91, 13.491, 13.708, 13.984, 14.624, 15.688, 15.823, 16.105,
16.387, 17.004, 17.239, 17.498, 17.686, 17.844, 17.997, 18.044,
18.003, 18.191, 18.332, 18.25, 18.778, 19.237, 19.693, 20.177,
20.441, 20.876, 20.512, 20.894, 21.493, 21.539, 21.951, 21.763,
22.498, 22.451, 22.744, 22.785, 23.409, 23.314, 23.408, 23.567,
23.849, 23.978, 24.472, 24.678, 25.236, 25.547, 25.676, 26.81,
26.83, 27.275, 27.331, 27.844, 28.009, 28.244, 28.497, 28.555,
29.067, 30.412, 30.788, 30.965, 31.058, 31.423, 31.346, 31.118,
31.252, 31.258, 31.399, 31.605, 31.552, 31.881, 31.822, 31.91,
32.333, 32.174, 32.222, 32.704, 32.445, 32.557, 32.993, 32.845,
32.997, 32.909, 32.911, 33.121, 33.191, 33.156, 33.426, 33.332,
33.52, 33.526, 33.697, 33.379, 33.849, 33.726, 33.538, 33.885,
33.961, 34.284, 34.208, 33.896, 34.278, 34.355, 34.276, 34.267,
34.399, 34.507, 34.492, 34.531, 34.695, 34.642, 34.872, 34.772,
34.813, 34.942, 34.883, 34.948, 34.719, 34.983, 34.99, 35.136,
35.007, 34.026, 35.148, 35.201, 35.459, 35.418, 35.236, 35.411,
35.459, 35.5, 35.665, 35.724, 35.636, 35.667, 35.747, 35.788,
35.882, 35.9, 35.83, 36.106, 36.029, 36.364, 36.358, 36.517,
37.005, 36.74, 36.963, 36.634, 37.04, 37.48, 37.581, 37.78, 37.686,
38.262, 37.998, 37.986, 38.498, 39.296, 38.467, 38.779, 38.885,
38.72, 39.038, 38.932, 39.719, 39.654, 39.367, 40.072, 39.707,
39.742, 39.919, 40.054, 40.189, 40.197, 40.154, 40.383, 42.146,
40.595, 40.971, 41.441, 41.964, 42.328, 42.463, 42.627, 42.633,
42.721, 42.786, 42.857, 45.318, 45.665, 46.406, 46.335, 47.663,
47.181, 48.074, 48.109, 49.931, 50.377, 51.053, 52.451, 53.004,
65.889)
> model <- lm(y ~ poly(x,3,raw=TRUE))
> model
Call:
lm(formula = y ~ poly(x, 3, raw = TRUE))
Coefficients:
(Intercept) poly(x, 3, raw = TRUE)1 poly(x, 3, raw = TRUE)2 poly(x, 3, raw = TRUE)3
6.6096981 1.4736619 -0.0238935 0.0002445
Since you have fitted a low order polynomial in ordinary form (raw = TRUE), you can use polyroot to directly find x given y.
## pc: polynomial coefficients in increasing order
solvePC <- function (pc, y) {
pc[1] <- pc[1] - y
## all roots, including complex ones
roots <- polyroot(pc)
## keep real roots
Re(roots)[abs(Im(roots)) / Mod(roots) < 1e-10]
}
y0 <- 38.9 ## example y-value
x0 <- solvePC(coef(model), y0)
#[1] 34.28348
plot(x, y, col = 8)
lines(x, model$fitted, lwd = 2)
abline(h = y0)
abline(v = x0)
To get an interval estimate, we can use sampling methods.
## polyfit: an ordinary polynomial regression model fitted by lm()
rootCI <- function (polyfit, y, nSamples = 1000, level = 0.05) {
## sample regression coefficients from their joint distribution
pc <- MASS::mvrnorm(nSamples, coef(polyfit), vcov(polyfit))
## for each row (a sample), call solvePC()
roots <- apply(pc, 1, solvePC, y)
## confidence interval
quantile(roots, prob = c(0.5 * level, 1 - 0.5 * level))
}
## 95% confidence interval
rootCI(model, y = y0)
# 2.5% 97.5%
#34.17981 34.38828
You can use optim:
Predict the y values given x:
pred_y <- function(x)predict(model, data.frame(x))
pred_y(x = 10)
[1] 19.20145
Now to predict x given y, we do:
pred_x <- function(y) optim(1, \(x) (y-pred_y(x))^2, method='BFGS')[[1]]
pred_x(19.20145)
[1] 10
The uniroot function is intended for this type of problem.
#coefficients for the model
coeff <- c(6.6096981, 1.4736619, -0.0238935, 0.0002445)
#define the equation which one needs the root of
modely <- function(x, y) {
# could use the predict function here
my<-coeff[1] + coeff[2]*x + coeff[3]*x**2 + coeff[4]*x**3
y-my
}
#use the uniroot functiion
#In this example y=10
uniroot(modely, lower=-100, upper=100, y=10)
$root
[1] 2.391022
$f.root
[1] -1.208443e-08
$iter
[1] 10
$init.it
[1] NA
$estim.prec
[1] 6.103516e-05
In this case for y=10, x = 2.391022

How to calculate the 99th percentile of a dataset

I have a dataset of the mean of weights of two sample sizes, I have 100,000 tests and I am trying to find out the 99th percentile but I do not understand how to do so, I have found out the median quartile by doing the following;
summary(Lifts)
Large Small
Min. : 62.5 Min. : 54.2
1st Qu.: 99.1 1st Qu.: 96.0
Median :106.0 Median :106.0
Mean :106.0 Mean :106.0
3rd Qu.:112.9 3rd Qu.:116.0
Max. :147.5 Max. :156.8
I need to find the 99th percentile of both the large and the small, I have tried using the quartile command;
quantile(Lifts, probs = c(0, 0.25, 0.50, 0.99))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) :
undefined columns selected
But I receive that error
any help would be appreciated
If we specify the column (for example using $ notation) we get rid of the error:
quantile(Lifts$Large, probs = c(0, 0.25, 0.50, 0.99))
Or
quantile(Lifts$Small, probs = c(0, 0.25, 0.50, 0.99))
Generally, to apply a function on all columns of a data frame, we can use lapply, which also works with quantile.
lapply(lifts, quantile, probs=c(0, 0.25, 0.50, 0.99))
# $large
# 0% 25% 50% 99%
# 14.400 161.675 488.450 950.506
#
# $small
# 0% 25% 50% 99%
# 0.900 30.800 43.650 97.744
We also may use sapply which does the same but yields as output a matrix.
sapply(lifts, quantile, probs=c(0, 0.25, 0.50, 0.99))
# large small
# 0% 14.400 0.900
# 25% 161.675 30.800
# 50% 488.450 43.650
# 99% 950.506 97.744
Data
lifts <- structure(list(large = c(489.9, 734.5, 905.6, 41.9, 950.2, 73.9,
444.7, 950.8, 303.9, 539, 399.4, 429.5, 670.2, 39.1, 324.6, 829.6,
97.9, 216.6, 500.1, 364.4, 762.6, 205.7, 191.6, 128.6, 749.2,
185, 736.9, 46.9, 114.2, 774.4, 626.5, 42.5, 52.5, 724.3, 518.3,
932.7, 602.5, 14.4, 794.9, 149.7, 621.6, 674.2, 685.1, 153.9,
42.3, 487, 787.5, 351.6, 689.3, 862.3), small = c(56.5, 63.6,
49.5, 76.7, 78, 25.8, 57.8, 19.2, 27.7, 38.3, 36.4, 4.4, 89.2,
68.8, 36.1, 71.8, 69.1, 35.8, 38.2, 26.9, 95.5, 30.7, 43.2, 58.8,
44.1, 35.4, 91.2, 37.1, 99.9, 94.5, 52, 38.2, 40.1, 50.9, 81.7,
7.5, 77.5, 71.9, 70.6, 8.2, 90.1, 31.1, 3.4, 52, 0.9, 30.5, 12.7,
45.6, 34.2, 13.4)), class = "data.frame", row.names = c(NA, -50L
))

find Optimal degree of polynomial using anova test in R

data=structure(list(Y = c(31.2, 33.3, 35.6, 36.4, 36.7, 38.4, 40.4,
40.3, 41.8, 40.4, 40.7, 40.1, 42.7, 44.1, 46.7, 50.6, 50.1, 51.7,
52.9), X1 = c(492.9, 528.6, 560.3, 624.6, 666.4, 717.8, 768.2,
843.3, 911.6, 931.1, 1021.5, 1165.9, 1349.6, 1449.4, 1575.5,
1759.1, 1994.2, 2258.1, 2478.7), X2 = c(37.3, 38.1, 39.3, 37.8,
38.4, 40.1, 38.6, 39.8, 39.7, 52.1, 48.9, 58.3, 57.9, 56.5, 63.7,
61.6, 58.9, 66.4, 70.4), X3 = c(54.7, 63.7, 69.8, 65.9, 64.5,
70, 73.2, 67.8, 79.1, 95.4, 94.2, 123.5, 129.9, 117.6, 130.9,
129.8, 128, 141, 168.2), X4 = c(77.4, 80.2, 80.4, 83.9, 85.5,
93.7, 106.1, 104.8, 114, 124.1, 127.6, 142.9, 143.6, 139.2, 165.5,
203.3, 219.6, 221.6, 232.6)), class = "data.frame", row.names = c(NA,
-19L))
I want Build several polynomial regressions using polynomials of varying degrees. But the question how to choose the optimal degree of polynomial for modeling this dependence using the ANOVA test?
I use such way
fit1 <- lm(Y ~ X1 + I(X1^2)+X2 + I(X2^3)+X3 + I(X3^2)+X4 + I(X4^2), data=data)
summary(fit1)
AIC(fit1)
but how using loops define optimal degree of polynomial.
Optimal degree of polynomial Is it in which the smallest AIC and largest F-value.
Using stepwise regression and the AIC criterion:
fm <- lm(Y ~ .^3, data)
step(fm)
giving:
Call:
lm(formula = Y ~ X1 + X2 + X3 + X4 + X1:X2 + X1:X3 + X1:X4 +
X2:X3 + X2:X4 + X3:X4 + X1:X2:X4 + X2:X3:X4, data = data)
Coefficients:
(Intercept) X1 X2 X3 X4 X1:X2
5.204e+00 1.388e-02 -2.004e+00 2.701e+00 -1.757e-01 -2.628e-03
X1:X3 X1:X4 X2:X3 X2:X4 X3:X4 X1:X2:X4
1.246e-03 7.076e-04 -2.836e-02 4.470e-02 -3.541e-02 -1.244e-05
X2:X3:X4
2.782e-04
There is also stepAIC in the MASS package.
A different approach is lasso regression.
library(glmnet)
X <- model.matrix(fm)
fm2 <- cv.glmnet(X, data$Y)
coef(fm2, s = fm2$lambda.min)
giving:
(Intercept) 2.829759e+01
(Intercept) .
X1 9.319929e-03
X2 -8.658000e-02
X3 .
X4 6.620549e-02
X1:X2 .
X1:X3 .
X1:X4 .
X2:X3 .
X2:X4 .
X3:X4 .
X1:X2:X3 .
X1:X2:X4 .
X1:X3:X4 -8.318340e-08
X2:X3:X4 .
Check out this link for a variety of approaches https://rpubs.com/McCloud77/300677

How can I get standard errors for my 4 parameters when the Hessian matrix from solnp is 5 by 5?

I'm using the solnp() function in the R package Rsolnp to solve a nonlinear regression with constraints. It works well, converges with no problem. I want to use the Hessian matrix to calculate standard errors of the four parameter estimates, but the Hessian is not 4 by 4 as I had expected, but 5 by 5. I looked around on SO and didn't see anyone else with an unexpected Hessian size. All the examples I found with the Hessians printed showed them to be the expected size of p by p (e.g., 2x2, 3x3, and 4x4).
How can I get standard errors for my 4 parameters from this 5 by 5 Hessian?
df <- data.frame(
Recruit.N = c(78.4, 79.8, 106, 57.4, 81.7, 94.4, 74.1, 42, 61.6, 47.7, 61.8,
28.1, 32.3, 19, 23.4, 20.1, 27),
Stock.5 = c(66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4, 81.7, 94.4,
74.1, 42, 61.6, 47.7, 61.8, 28.1),
Stock.6 = c(25.2, 66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4, 81.7,
94.4, 74.1, 42, 61.6, 47.7, 61.8),
Stock.7 = c(23.8, 25.2, 66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4,
81.7, 94.4, 74.1, 42, 61.6, 47.7)
)
lossfcn <- function(parz, mydat) {
alpha <- parz[[1]]
beta <- parz[[2]]
p5 <- parz[[3]]
p6 <- parz[[4]]
p7 <- 1 - p5 - p6
S <- with(mydat, p5*Stock.5 + p6*Stock.6 + p7*Stock.7)
Obs <- mydat$Recruit.N
Pred <- alpha * S * exp(-beta*S)
Resid <- log(Obs) - log(Pred)
sigma <- sqrt(mean(Resid^2))
LL <- dlnorm(Obs, meanlog=log(Pred), sdlog=sigma, log=TRUE)
-sum(LL)
}
inequal <- function(parz, mydat) {
parz[3] + parz[4]
}
library(Rsolnp)
solnp(pars=c(1, 0.008, 1/3, 1/3), fun=lossfcn, mydat=df,
ineqfun=inequal, ineqLB=0, ineqUB=1,
LB=c(0, 0, 0, 0), UB=c(1000, 1000, 1, 1), control=list(trace=0))
$pars
[1] 6.731317e-01 1.888572e-10 8.141363e-01 1.858631e-01
$convergence
[1] 0
$values
[1] 79.87150 75.50927 75.50927 75.50927
$lagrange
[,1]
[1,] -2.028222
$hessian
[,1] [,2] [,3] [,4] [,5]
[1,] 0.3350868 -3.359077e-01 17.84919 -0.4306057 -0.3382811
[2,] -0.3359077 1.993956e+02 -10161.63351 -7.0844295 -2.2749785
[3,] 17.8491854 -1.016163e+04 548099.69224 -85.9544831 -224.0362766
[4,] -0.4306057 -7.084429e+00 -85.95448 25.1086694 5.8817704
[5,] -0.3382811 -2.274979e+00 -224.03628 5.8817704 4.1978178
$ineqx0
[1] 0.9999995
$nfuneval
[1] 142
$outer.iter
[1] 3
$elapsed
Time difference of 0.03016496 secs
$vscale
[1] 1 1 1 1 1 1
Unlike the 3 posts you linked, you have an inequality constraint. Check the ineqx0 in the returned values: the other posts have NULL but you have 0.9999995. With an inequality constraint, there is a slack variable, so the problem is augmented. The hessian matrix returned is for this augmented set of parameters. Just retain the first 4 x 4 submatrix of the hessian for your wanted parameters.

R: How to or should I drop an insignificant orthogonal polynomial basis in a linear model?

I have soil moisture data with x-, y- and z-coordinates like this:
gue <- structure(list(x = c(311939.1507, 311935.4607, 311924.7316, 311959.553,
311973.5368, 311953.3743, 311957.9409, 311948.3151, 311946.7169,
311997.0803, 312017.5236, 312006.0245, 312001.5179, 311992.7044,
311977.3076, 311960.4159, 311970.6047, 311957.2564, 311866.4246,
311870.8714, 311861.4461, 311928.7096, 311929.6291, 311929.4233,
311891.2915, 311890.3429, 311900.8905, 311864.4995, 311870.8143,
311866.9257, 312002.571, 312017.816, 312004.5024, 311947.1186,
311943.0152, 311952.2695, 311920.6095, 311929.8371, 311918.6095,
312011.9019, 311999.5755, 312011.1461, 311913.7251, 311925.3459,
311944.4701, 311910.2079, 311908.7618, 311896.0776, 311864.4814,
311856.9027, 311857.5747, 311967.3779, 311962.2024, 311956.8318,
311977.5254, 311971.1776, 311982.537, 311993.4709, 312004.6407,
312015.6118, 311990.8601, 311994.686, 311988.3037, 311990.518,
311986.3918, 311998.8876, 311923.9157, 311903.4563, 311915.714,
311856.9087, 311858.9812, 311874.5867, 311963.9099, 311938.4542,
311945.9505, 311804.3039, 311797.2571, 311791.6967, 311921.3965,
311928.9353, 311920.0597, 311833.5109, 311829.8683, 311847.6261,
311889.1243, 311902.4909, 311901.245, 311981.1118, 312005.7098,
311976.5858, 311819.8901, 311816.4143, 311819.4172, 311870.418,
311873.2656, 311888.3401, 311910.8377, 311897.6697, 311902.4571,
311846.8196, 311833.6235, 311846.2942, 311931.3916, 311930.1891,
311947.659, 311792.2642, 311793.2539, 311794.1931, 311795.1288,
311796.0806, 311797.0142, 311797.95, 311798.8822, 311799.8229,
311800.7774, 311801.7094, 311802.6395, 311803.583, 311804.5185,
311805.4558, 311806.391, 311807.3346, 311808.2757, 311809.2187,
311810.1549, 311811.1014, 311812.0366, 311812.9667, 311813.9107,
311814.8373, 311815.7777, 311816.7365, 311817.6522, 311818.6091,
311819.5335, 311820.4961, 311821.4337, 311822.3855, 311823.3195,
311824.2713, 311825.214, 311826.1705, 311827.1188, 311828.0501,
311828.9893, 311829.9324, 311830.8706, 311831.8181, 311832.7667,
311833.705, 311834.6546, 311835.609, 311836.5527, 311837.5157,
311838.4495, 311839.3926, 311840.3423, 311841.2799, 311842.2288,
311843.1691, 311844.118, 311845.0746, 311846.019, 311846.9709,
311847.9201, 311848.859, 311849.8105, 311850.7503, 311851.6889,
311852.6355, 311853.6045, 311854.5296, 311855.4717, 311856.4171,
311857.3759, 311858.3151, 311859.2604, 311860.2178, 311861.1636,
311862.1071, 311863.0347, 311863.9857, 311864.9316, 311865.8722,
311866.8158, 311867.7702, 311868.7155, 311869.649, 311870.6018,
311871.5449, 311872.4871, 311873.4352, 311874.385, 311875.3042,
311876.2617, 311877.2068, 311878.1429, 311879.0956, 311880.0401,
311880.9822, 311881.929, 311882.8651, 311883.8017, 311884.7429,
311885.6949, 311886.6349, 311887.7207, 311888.6653, 311889.6041,
311890.5358, 311891.4838, 311892.4292, 311893.3736, 311894.326,
311895.2703, 311896.2182, 311897.1635, 311898.1032, 311899.0496,
311899.9967, 311900.9456, 311901.8889, 311902.8162, 311903.7566,
311904.6996, 311905.6627, 311906.5899, 311907.5448, 311908.4856,
311909.4399, 311910.3649, 311911.3188, 311912.2629, 311913.2022,
311914.1527, 311915.1025, 311916.0425, 311916.985, 311917.9254,
311918.8661, 311919.8174, 311920.7668, 311921.7026, 311922.6517,
311923.5949, 311924.5252, 311925.4599, 311926.422, 311927.3646,
311928.3, 311929.2432, 311930.1796, 311931.1358, 311932.0726,
311933.0069, 311933.9585, 311934.845, 311935.7788, 311936.7193,
311937.6441, 311938.572, 311939.5094, 311940.4666, 311941.4067,
311942.3489, 311943.2712, 311944.2195, 311945.1536, 311946.0927,
311947.0413, 311947.9761, 311948.9082, 311949.8557, 311950.8201,
311951.7616, 311952.7148, 311953.7894, 311954.7289, 311955.6646,
311956.6081, 311957.5588, 311958.4896, 311959.4297, 311960.3761,
311961.3191, 311962.2653, 311963.195, 311964.1501, 311965.0856,
311966.0254, 311966.9739, 311967.9305, 311968.8592, 311971.7861,
311970.758, 311969.8205), y = c(5846548.408, 5846546.489, 5846538.014,
5846525.283, 5846510.302, 5846503.516, 5846529.769, 5846523.06,
5846522.742, 5846512.263, 5846525.347, 5846522.042, 5846537.487,
5846545.587, 5846532.112, 5846425.917, 5846406.543, 5846434.03,
5846500.989, 5846498.286, 5846487.134, 5846488.045, 5846483.29,
5846468.713, 5846534.269, 5846533.527, 5846504.056, 5846453.395,
5846438.43, 5846442.608, 5846406.8, 5846434.58, 5846419.229,
5846441.045, 5846436.903, 5846447.917, 5846460.757, 5846457.428,
5846451.067, 5846445.596, 5846474.031, 5846457.239, 5846532.694,
5846553.938, 5846565.323, 5846446.926, 5846432.549, 5846467.236,
5846473.963, 5846464.78, 5846498.142, 5846458.168, 5846474.018,
5846489.801, 5846559.513, 5846589.975, 5846555.723, 5846553.847,
5846560.066, 5846560.792, 5846455.642, 5846546.374, 5846465.999,
5846432.091, 5846422.061, 5846442.871, 5846485.956, 5846472.811,
5846506.756, 5846416.327, 5846419.623, 5846413.124, 5846587.334,
5846600.116, 5846589.515, 5846463.69, 5846456.712, 5846459.683,
5846600.118, 5846574.99, 5846597.804, 5846419.496, 5846437.615,
5846436.902, 5846567.872, 5846572.857, 5846556.904, 5846388.146,
5846393.088, 5846390.13, 5846481.09, 5846496.127, 5846493.586,
5846545.396, 5846532.126, 5846538.334, 5846388.343, 5846416.117,
5846392.223, 5846513.526, 5846486.644, 5846512.917, 5846395.509,
5846386.421, 5846383.873, 5846459.062, 5846459.36, 5846459.682,
5846460.026, 5846460.377, 5846460.703, 5846461.047, 5846461.378,
5846461.73, 5846462.071, 5846462.418, 5846462.765, 5846463.115,
5846463.466, 5846463.815, 5846464.128, 5846464.505, 5846464.843,
5846465.189, 5846465.52, 5846465.869, 5846466.217, 5846466.557,
5846466.893, 5846467.237, 5846467.586, 5846467.903, 5846468.274,
5846468.601, 5846468.943, 5846469.258, 5846469.592, 5846469.909,
5846470.247, 5846470.565, 5846470.891, 5846471.24, 5846471.536,
5846471.885, 5846472.224, 5846472.553, 5846472.884, 5846473.225,
5846473.532, 5846473.89, 5846474.179, 5846474.502, 5846474.827,
5846475.146, 5846475.448, 5846475.768, 5846476.102, 5846476.428,
5846476.746, 5846477.069, 5846477.37, 5846477.685, 5846478.009,
5846478.335, 5846478.656, 5846478.958, 5846479.299, 5846479.608,
5846479.926, 5846480.267, 5846480.603, 5846480.908, 5846481.246,
5846481.56, 5846481.877, 5846482.19, 5846482.503, 5846482.825,
5846483.144, 5846483.468, 5846483.811, 5846484.13, 5846484.458,
5846484.8, 5846485.125, 5846485.456, 5846485.778, 5846486.112,
5846486.421, 5846486.75, 5846487.08, 5846487.401, 5846487.744,
5846488.067, 5846488.39, 5846488.728, 5846489.067, 5846489.383,
5846489.716, 5846490.054, 5846490.38, 5846490.719, 5846491.044,
5846491.357, 5846491.694, 5846492.005, 5846492.402, 5846492.726,
5846493.045, 5846493.389, 5846493.708, 5846494.049, 5846494.363,
5846494.686, 5846494.982, 5846495.3, 5846495.64, 5846495.957,
5846496.263, 5846496.584, 5846496.911, 5846497.241, 5846497.591,
5846497.914, 5846498.226, 5846498.553, 5846498.893, 5846499.221,
5846499.538, 5846499.869, 5846500.19, 5846500.508, 5846500.82,
5846501.151, 5846501.492, 5846501.827, 5846502.147, 5846502.471,
5846502.803, 5846503.129, 5846503.46, 5846503.783, 5846504.11,
5846504.448, 5846504.76, 5846505.118, 5846505.445, 5846505.79,
5846506.106, 5846506.465, 5846506.795, 5846507.118, 5846507.448,
5846507.758, 5846508.081, 5846508.396, 5846508.645, 5846508.99,
5846509.34, 5846509.685, 5846510.031, 5846510.363, 5846510.693,
5846511.031, 5846511.362, 5846511.694, 5846512.024, 5846512.354,
5846512.701, 5846513.034, 5846513.353, 5846513.683, 5846513.998,
5846514.32, 5846514.636, 5846514.956, 5846515.326, 5846515.65,
5846515.968, 5846516.301, 5846516.634, 5846516.971, 5846517.318,
5846517.64, 5846517.952, 5846518.308, 5846518.626, 5846518.937,
5846519.27, 5846519.597, 5846519.921, 5846520.245, 5846520.581,
5846521.498, 5846521.209, 5846520.893), z = c(26.485, 26.411,
26.339, 27.248, 27.208, 26.799, 27.199, 27.023, 26.973, 26.908,
26.275, 26.474, 26.316, 26.226, 27.184, 25.903, 25.765, 25.931,
26.057, 26.181, 26.102, 26.436, 26.457, 26.396, 25.585, 25.572,
26.448, 25.637, 25.603, 25.634, 25.847, 26.185, 25.899, 26.016,
25.873, 26.299, 26.358, 26.344, 26.088, 26.264, 26.3, 26.306,
26.311, 25.857, 26.004, 25.824, 25.798, 26.326, 26.03, 25.625,
25.78, 26.368, 26.225, 26.582, 26.398, 25.343, 26.253, 25.908,
25.323, 25.381, 26.3, 26.179, 26.284, 26.024, 25.896, 26.251,
26.447, 26.385, 26.419, 25.188, 25.176, 25.169, 25.348, 25.188,
25.291, 25.285, 25.266, 25.262, 25.333, 25.308, 25.314, 25.145,
25.172, 25.22, 25.235, 25.204, 25.286, 25.155, 25.397, 25.202,
25.373, 25.327, 25.341, 25.172, 25.253, 25.318, 25.023, 25.24,
25.132, 25.264, 25.38, 25.221, 25.119, 25.179, 25.083, 25.258,
25.254, 25.235, 25.252, 25.266, 25.256, 25.264, 25.26, 25.262,
25.265, 25.265, 25.285, 25.28, 25.257, 25.254, 25.258, 25.287,
25.294, 25.282, 25.27, 25.268, 25.309, 25.303, 25.3, 25.312,
25.305, 25.3, 25.314, 25.319, 25.328, 25.304, 25.325, 25.308,
25.332, 25.333, 25.333, 25.346, 25.344, 25.339, 25.355, 25.362,
25.36, 25.391, 25.418, 25.434, 25.436, 25.447, 25.486, 25.5,
25.526, 25.552, 25.551, 25.564, 25.589, 25.606, 25.641, 25.672,
25.689, 25.709, 25.736, 25.758, 25.782, 25.836, 25.844, 25.866,
25.88, 25.935, 25.984, 26.037, 26.066, 26.071, 26.094, 26.106,
26.106, 26.118, 26.1, 26.146, 26.135, 26.156, 26.169, 26.162,
26.173, 26.198, 26.196, 26.228, 26.258, 26.276, 26.283, 26.277,
26.236, 26.277, 26.251, 26.264, 26.26, 26.261, 26.249, 26.307,
26.289, 26.243, 26.206, 26.231, 26.224, 26.238, 26.244, 26.245,
26.254, 26.2, 26.229, 26.24, 26.248, 26.223, 26.29, 26.344, 26.371,
26.364, 26.311, 26.343, 26.342, 26.334, 26.317, 26.342, 26.315,
26.312, 26.322, 26.325, 26.324, 26.32, 26.308, 26.329, 26.31,
26.32, 26.327, 26.34, 26.371, 26.442, 26.442, 26.483, 26.504,
26.526, 26.562, 26.562, 26.538, 26.534, 26.533, 26.541, 26.584,
26.642, 26.65, 26.691, 26.719, 26.755, 26.786, 26.794, 26.849,
26.867, 26.919, 26.93, 26.945, 26.947, 26.959, 26.984, 26.992,
27.006, 27.035, 27.021, 27.052, 27.094, 27.104, 27.119, 27.16,
27.182, 27.223, 27.236, 27.267, 27.304, 27.331, 27.348, 27.341,
27.379, 27.355, 27.378, 27.357, 27.373, 27.319, 27.299, 27.278,
27.28, 27.295, 27.288, 27.286, 27.279), soil_m_sat = c(24.1,
24.2, 26.9, 13.9, 20.6, 34.1, 16.2, 16.7, 16, 22.1, 23.9, 27.2,
26.8, 34.4, 26.3, 54.1, 51, 44.9, 46.4, 45.9, 54.7, 39.1, 38.7,
40.7, 56.5, 56.3, 40.6, 60.9, 56.8, 56.3, 40.7, 40.4, 44.1, 44.9,
46.2, 45.3, 46.1, 43.7, 44.9, 45.4, 33.1, 45.8, 27.6, 47.8, 37.3,
58.9, 51.4, 42.1, 46, 66.6, 51.1, 31.6, 48.7, 32.9, 28.1, 84,
37.7, 38.2, 80.4, 73.3, 35.6, 44.2, 39.7, 50.2, 49.9, 37.8, 37,
41.7, 27.3, 100, 100, 100, 80.9, 100, 88.4, 89.6, 93.8, 95.3,
91.9, 93.9, 96.1, 91.4, 100, 94.4, 100, 100, 80, 94.1, 84.4,
91.1, 80, 78.9, 85.9, 100, 97.5, 87.2, 88.6, 83.3, 90.7, 100,
82.2, 100, 96.3, 93.3, 99.6, 92.1, 92.8, 90.9, 92.3, 91.2, 94.5,
91.8, 89.4, 87, 86, 88, 83.7, 88.8, 92.9, 89.3, 83.3, 83.5, 84.5,
85.8, 87.4, 86.5, 82, 78.1, 85.8, 85.6, 88.7, 87.7, 84.9, 82,
87.9, 85.5, 86, 82, 83, 88.5, 81.2, 81.6, 76.5, 77.6, 84.5, 81.5,
82, 82.4, 68, 67.7, 62.1, 68.9, 61.7, 68.5, 68.6, 65.3, 59.5,
60.8, 67.3, 66.2, 59.9, 50.9, 46.9, 44.6, 47.9, 53, 52.1, 48.3,
41.3, 53.8, 51, 47, 53.7, 49.5, 51.1, 44.4, 35.1, 42.2, 41.5,
40, 48.2, 46.7, 48.6, 51.7, 51.2, 52.3, 53.4, 48.9, 50.7, 48.5,
46.5, 39.4, 38, 49.2, 43.6, 47.1, 40.4, 44.7, 45.7, 38.1, 41.9,
39.3, 40.2, 43.8, 47.3, 50.1, 41.2, 39.8, 46, 40.8, 40, 37.8,
42.6, 46, 43.8, 45.4, 42.2, 46.5, 40.4, 39.9, 53, 44.7, 35.8,
42.9, 43.9, 43.2, 40.6, 40.8, 32.2, 32.6, 33.5, 36.7, 34.6, 34.7,
50.9, 35.6, 34.2, 28.1, 42, 32, 42.3, 30, 29.6, 31, 29.8, 26,
37.8, 40, 37, 30.2, 28.2, 26.2, 27.4, 22.1, 28.4, 23.2, 24.8,
26.5, 23.9, 21.1, 27.2, 20.8, 12.5, 14, 17.9, 19.7, 19.4, 26,
16.7, 18.2, 23.9, 19, 25.9, 24.4, 22.1, 19.2, 18.4, 24.7, 17.3,
19.4, 19.6, 17.7, 21.3, 22.1, 17.9, 28.2, 16.3, 25.3, 19.7, 21.7,
19, 18.8, 11.8, 15.6, 9.8, 17.7)), .Names = c("x", "y", "z",
"soil_m_sat"), class = "data.frame", row.names = c(NA, -296L))
In order to estimate a variogram for this data I need to remove the spatial trend from it. The soil moisture, of course, varies with the surface - the higher a point is the dryer it is. And since this soil moisture data is percetagewise the relationship is hardly linear, what leads me to allow up to cubic dependencies of the soil moisture to the z-coordinate. It happens that in this area there is a small more or less elliptic elevation, so that I want to allow the soil moisture to be dependend of the x- and y-coordinates in a quadratic way. I hope the following model does exactly this:
polymod <- lm(soil_m_sat ~ poly(x + y, degree = 2) + poly(z, degree = 3), data = gue)
summary(polymod)
The summary shows me that there is no significance for the first coefficient of the x- and y-dependency (what summary names poly(x + y, degree = 2)1). Because the help page from poly() told me that it "returns or evaluates orthogonal polynomials of degree 1 to degree", I thought, removing a degree one polynom from the model might be the same as removing the first coefficient of the degree 2 polynom. Therefore I tried to remove it like this:
mod <- lm(soil_m_sat ~ poly(x + y, degree = 2) - poly(x + y, degree = 1) + poly(z, degree = 3), data = gue)
summary(mod)
But the summary of mod looks exactly the same as the summary of polymod, meaning mod does not differ from polymod. How is it possible to remove the unsignificant component then?
No, don't check with summary in this case. You should use anova. A polynomial term from poly(), or a spline term from bs() contains more than coefficients, so they are more like a factor variable with multiple levels.
> anova(polymod)
Analysis of Variance Table
Response: soil_m_sat
Df Sum Sq Mean Sq F value Pr(>F)
poly(x + y, degree = 2) 2 113484 56742 1600.8 < 2.2e-16 ***
poly(z, degree = 3) 3 68538 22846 644.5 < 2.2e-16 ***
Residuals 290 10280 35
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The ANOVA table clearly shows that you need all model terms. Do not drop any.
But I still need to answer your question and make you feel happy.
It is not impossible to drop the poly(x + y, degree = 2)1 term, but you need to access model matrix for such purpose. You may do
gue$XY_poly <- with(gue, poly(x + y, degree = 2))[, 2] ## use the 2nd column only
fit <- lm(soil_m_sat ~ XY_poly + poly(z, degree = 3), data = gue)
summary(fit)
## ...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.3071 0.3459 151.217 < 2e-16 ***
XY_poly -18.8515 7.3894 -2.551 0.0112 *
poly(z, degree = 3)1 -418.1634 6.4937 -64.395 < 2e-16 ***
poly(z, degree = 3)2 116.5327 6.9171 16.847 < 2e-16 ***
poly(z, degree = 3)3 -28.7773 5.9517 -4.835 2.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.951 on 291 degrees of freedom
Multiple R-squared: 0.9464, Adjusted R-squared: 0.9457
F-statistic: 1285 on 4 and 291 DF, p-value: < 2.2e-16

Resources