How to calculate the 99th percentile of a dataset - r

I have a dataset of the mean of weights of two sample sizes, I have 100,000 tests and I am trying to find out the 99th percentile but I do not understand how to do so, I have found out the median quartile by doing the following;
summary(Lifts)
Large Small
Min. : 62.5 Min. : 54.2
1st Qu.: 99.1 1st Qu.: 96.0
Median :106.0 Median :106.0
Mean :106.0 Mean :106.0
3rd Qu.:112.9 3rd Qu.:116.0
Max. :147.5 Max. :156.8
I need to find the 99th percentile of both the large and the small, I have tried using the quartile command;
quantile(Lifts, probs = c(0, 0.25, 0.50, 0.99))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) :
undefined columns selected
But I receive that error
any help would be appreciated

If we specify the column (for example using $ notation) we get rid of the error:
quantile(Lifts$Large, probs = c(0, 0.25, 0.50, 0.99))
Or
quantile(Lifts$Small, probs = c(0, 0.25, 0.50, 0.99))

Generally, to apply a function on all columns of a data frame, we can use lapply, which also works with quantile.
lapply(lifts, quantile, probs=c(0, 0.25, 0.50, 0.99))
# $large
# 0% 25% 50% 99%
# 14.400 161.675 488.450 950.506
#
# $small
# 0% 25% 50% 99%
# 0.900 30.800 43.650 97.744
We also may use sapply which does the same but yields as output a matrix.
sapply(lifts, quantile, probs=c(0, 0.25, 0.50, 0.99))
# large small
# 0% 14.400 0.900
# 25% 161.675 30.800
# 50% 488.450 43.650
# 99% 950.506 97.744
Data
lifts <- structure(list(large = c(489.9, 734.5, 905.6, 41.9, 950.2, 73.9,
444.7, 950.8, 303.9, 539, 399.4, 429.5, 670.2, 39.1, 324.6, 829.6,
97.9, 216.6, 500.1, 364.4, 762.6, 205.7, 191.6, 128.6, 749.2,
185, 736.9, 46.9, 114.2, 774.4, 626.5, 42.5, 52.5, 724.3, 518.3,
932.7, 602.5, 14.4, 794.9, 149.7, 621.6, 674.2, 685.1, 153.9,
42.3, 487, 787.5, 351.6, 689.3, 862.3), small = c(56.5, 63.6,
49.5, 76.7, 78, 25.8, 57.8, 19.2, 27.7, 38.3, 36.4, 4.4, 89.2,
68.8, 36.1, 71.8, 69.1, 35.8, 38.2, 26.9, 95.5, 30.7, 43.2, 58.8,
44.1, 35.4, 91.2, 37.1, 99.9, 94.5, 52, 38.2, 40.1, 50.9, 81.7,
7.5, 77.5, 71.9, 70.6, 8.2, 90.1, 31.1, 3.4, 52, 0.9, 30.5, 12.7,
45.6, 34.2, 13.4)), class = "data.frame", row.names = c(NA, -50L
))

Related

How to specify data range for nls function in r?

I am sorry for maybe a trivial question, but unfortunately i haven´t found a solution for it. Here is my problem...
I have created a function bm6, with 3 unknown parameters (a, l, p), with which i want to aproximate the measured data, that are found in the dataframe zz. For fitting i have used nls model in r.
nls(zz$tuReMa~bm6(zz$Time, t0=30, tau=10, a, l, p), data=zz, start=list(a=0.01, l=0.01, p=0.1))
The model converges on the whole data range, that means from row 1 till 100, and yield the searched parameters.
Data and fit plot:
https://i.stack.imgur.com/iOn5q.png
Now i want to specify my datarange, so that the nls model will take only the data between row 7 and row 37. How can you do that? I have already tried some things but without a success.
nls(zz$tuReMa[7:37]~bm6(zz$Time[7:37], t0=30, tau=10, a, l, p), data=zz, start=list(a=0.01, l=0.01, p=0.1))
the latter works fine in a lm model
other data argument data=list(zz$Time[7:37],zz$tuReMa[7:37])
with subset argument subset = c(7:37)
One can also create a new dataframe with values from 7:37, and than apply the nls model on the new dataframe, but i hope it goes also without this detour.
Additional data:
bm6 <- function(t, t0=30, tau=10, a, l, p) {
ifelse(t<=(tau+t0), 1+a/(p-l)*(exp(-l*(t-t0))/l*(exp(l*(t-t0))-1)-exp(-p*(t-t0))/p*(exp(p*(t-t0))-1)),
1+a/(p-l)*(exp(-l*(t-t0))/l*(exp(l*tau)-1)-exp(-p*(t-t0))/p*(exp(p*tau)-1)))
}
DATA
structure(list(Time = c(0, 5.01, 10.01, 15.02, 20.02, 25.03,
30.03, 35.04, 40.04, 45.05, 50.05, 55.05, 60.06, 66.07, 71.07,
76.08, 81.08, 86.09, 91.09, 96.1, 101.1, 106.11, 111.11, 116.12,
121.12, 126.13, 131.13, 136.14, 142.14, 147.15, 152.15, 157.16,
162.16, 167.17, 172.17, 177.18, 182.18, 187.19, 192.19, 197.2,
202.2, 207.21, 213.21, 218.22, 223.22, 228.23, 233.23, 238.24,
243.24, 248.25, 253.25, 258.26, 263.26, 268.27, 273.27, 278.28,
284.29, 289.29, 294.29, 299.3, 304.3, 309.31, 314.31, 319.32,
324.32, 329.33, 334.34, 339.34, 344.34, 349.35, 355.36, 360.36,
365.36, 370.37, 375.38, 380.38, 385.39, 390.39, 395.39, 400.4,
405.41, 410.41, 415.41, 420.42, 426.43, 431.43, 436.43, 441.44,
446.45, 451.45, 456.46, 461.46, 466.46, 471.47, 476.48, 481.48,
486.48, 491.49, 497.5, 502.5), tu = c(24.8, 16.4, 24.1, 25.8,
20.2, 21, 18.6, 11.8, 21.1, 66.8, 67.4, 72.5, 73.3, 71.6, 72,
65.5, 67.8, 57.1, 61.5, 58.6, 55.9, 60.2, 54.1, 54.6, 52.7, 54.3,
49.8, 49.4, 54.8, 49, 52.4, 50.8, 45.9, 48.4, 48.1, 48.1, 50.5,
44.2, 42.9, 47.3, 51.7, 46.1, 46.9, 44.6, 46.1, 48, 43.2, 38.5,
49.7, 47, 46.9, 51.8, 45, 46.7, 45.8, 39.8, 43.8, 43.3, 45.5,
45.3, 45.9, 38.9, 44.4, 40.8, 40.5, 39.8, 43, 38, 44.7, 42.1,
43, 39.4, 36.6, 44.9, 42.8, 37.2, 41.7, 41.8, 34.7, 44.4, 43.8,
44.7, 44.6, 46.5, 49.7, 42, 36.3, 43.5, 43.7, 41.7, 39.3, 42.5,
45.4, 37.6, 46, 38.5, 39.6, 37.7, 37.9, 39.9), mu = c(26.64,
27.16, 23.43, 24.35, 24.79, 25.4, 25.27, 23.61, 25.36, 27.47,
30.17, 29.94, 28.06, 32.19, 30.96, 35.87, 32.48, 32.41, 33.09,
35.4, 33.68, 33.5, 32.83, 34.19, 32.25, 34.76, 33.69, 33.03,
35.09, 37.13, 36.64, 33.51, 32.91, 33.56, 34.78, 36.06, 33.74,
32.87, 35.57, 36.17, 35.52, 34.43, 33.85, 33.93, 36.69, 34.77,
34.14, 33.46, 34.14, 34.5, 33.03, 33.69, 33.02, 34.23, 33.22,
35.46, 34.28, 31.87, 32.91, 34.25, 33.75, 33.66, 31.08, 32.72,
36.13, 35.3, 32.37, 31.25, 32.98, 34, 34.3, 33.69, 32.33, 33.01,
36.03, 31.59, 34.09, 30.76, 31.8, 32.93, 35.32, 33.69, 31.58,
33.99, 33.67, 33.89, 32.99, 31.17, 32.08, 33.42, 33.91, 34.36,
31.96, 33.27, 31.9, 33.7, 33.16, 30.01, 32.04, 33.59), tuRE = c(1.15043074884029,
0.76076872100729, 1.11795891318754, 1.19681908548708, 0.937044400265076,
0.974155069582505, 0.862823061630219, 0.547382372432074, 0.978793903247184,
3.0987408880053, 3.12657388999337, 3.36315440689198, 3.40026507620941,
3.32140490390987, 3.33996023856859, 3.03843605036448, 3.14512922465209,
2.64877402253148, 2.85288270377734, 2.71835652750166, 2.59310801855533,
2.79257786613651, 2.50960901259112, 2.53280318091451, 2.44466534128562,
2.51888667992048, 2.31013916500994, 2.29158383035123, 2.54208084824387,
2.27302849569251, 2.43074884029158, 2.35652750165673, 2.12922465208747,
2.24519549370444, 2.2312789927104, 2.2312789927104, 2.34261100066269,
2.05036447978794, 1.99005964214712, 2.19416832339298, 2.39827700463883,
2.13850231941683, 2.17561298873426, 2.06891981444665, 2.13850231941683,
2.22664015904573, 2.00397614314115, 1.78595096090126, 2.30550033134526,
2.18025182239894, 2.17561298873426, 2.40291583830351, 2.08747514910537,
2.1663353214049, 2.1245858184228, 1.84625579854208, 2.03180914512922,
2.00861497680583, 2.11066931742876, 2.1013916500994, 2.12922465208747,
1.80450629555997, 2.0596421471173, 1.89264413518887, 1.87872763419483,
1.84625579854208, 1.9946984758118, 1.76275679257787, 2.07355864811133,
1.95294897282969, 1.9946984758118, 1.82770046388337, 1.69781312127237,
2.08283631544069, 1.98542080848244, 1.72564612326044, 1.93439363817097,
1.93903247183565, 1.60967528164347, 2.0596421471173, 2.03180914512922,
2.07355864811133, 2.06891981444665, 2.15705765407555, 2.30550033134526,
1.94831013916501, 1.68389662027833, 2.01789264413519, 2.02717031146455,
1.93439363817097, 1.82306163021869, 1.9715043074884, 2.10603048376408,
1.74420145791915, 2.13386348575215, 1.78595096090126, 1.83697813121272,
1.74884029158383, 1.75811795891319, 1.85089463220676), tuMA = c(24.8,
20.6, 21.7666666666667, 22.775, 22.2733333333333, 21.8533333333333,
20.8866666666667, 17.5066666666667, 18.0466666666667, 34.1333333333333,
47.3133333333333, 59.1, 67.56, 71.3533333333333, 71.9133333333333,
69.96, 68.9, 64.5866666666667, 62.82, 60.76, 58.6933333333333,
58.7, 57.18, 56.0266666666667, 54.7, 54.3, 52.5066666666667,
51.2733333333333, 52.1533333333333, 51.0866666666667, 51.4, 51.3066666666667,
49.5133333333333, 48.7866666666667, 48.3866666666667, 48.0466666666667,
48.7933333333333, 47.46, 45.8066666666667, 45.9866666666667,
47.6866666666667, 47.28, 47.4333333333333, 46.64, 46.2333333333333,
46.54, 45.4933333333333, 43.0733333333333, 44.9466666666667,
45.58, 46.12, 48.3666666666667, 47.7733333333333, 47.3133333333333,
46.7533333333333, 44.2733333333333, 43.6, 43.2933333333333, 43.8333333333333,
44.3866666666667, 45.1733333333333, 43.22, 43.4266666666667,
42.36, 41.5066666666667, 40.74, 41.4466666666667, 40.2133333333333,
41.64, 41.94, 42.4333333333333, 41.5133333333333, 39.9, 41.1466666666667,
41.68, 40.3, 40.8066666666667, 41.1933333333333, 38.8666666666667,
40.4533333333333, 41.7333333333333, 42.8733333333333, 43.78,
45.1333333333333, 46.7666666666667, 45.48, 42.4133333333333,
42.3066666666667, 42.34, 41.8933333333333, 41.18, 41.7133333333333,
42.8, 41.16, 42.7266666666667, 41.5066666666667, 40.7066666666667,
39.4666666666667, 38.8066666666667, 38.7933333333333), tuReMa = c(1.15043074884029,
0.955599734923791, 1.00971946101171, 1.05649436713055, 1.03322288491275,
1.0137397835211, 0.968897724762536, 0.812105146896399, 0.837154848685664,
1.58338855754363, 2.19478683454827, 2.74155069582505, 3.13399602385686,
3.309962447537, 3.3359399160592, 3.24532803180915, 3.19615639496355,
2.99606803622708, 2.91411530815109, 2.81855533465871, 2.72268610558869,
2.72299536116634, 2.65248508946322, 2.59898387453059, 2.53744201457919,
2.51888667992048, 2.43569692953391, 2.37848464766954, 2.41930638391871,
2.36982549149547, 2.3843605036448, 2.38003092555776, 2.2968411751712,
2.26313231720786, 2.24457698254915, 2.22880494808924, 2.26344157278551,
2.20159045725646, 2.12489507400044, 2.13324497459686, 2.2121051468964,
2.19324055666004, 2.20035343494588, 2.1635520212061, 2.14468743096974,
2.15891318754142, 2.11036006185112, 1.99810028716589, 2.08500110448421,
2.1143803843605, 2.13943008614977, 2.24364921581621, 2.21612546940579,
2.19478683454827, 2.16880936602607, 2.05376629114204, 2.02253147779987,
2.00830572122819, 2.03335542301745, 2.05902363596201, 2.09551579412414,
2.00490390987409, 2.01449083278109, 1.96500994035785, 1.92542522641926,
1.88986083499006, 1.92264192622046, 1.86542964435609, 1.93161033797217,
1.9455268389662, 1.96841175171195, 1.92573448199691, 1.85089463220676,
1.90872542522642, 1.93346587143804, 1.86944996686547, 1.89295339076651,
1.91089021426994, 1.80296001767175, 1.87656284515131, 1.9359399160592,
1.98882261983654, 2.03088137839629, 2.09366026065827, 2.16942787718136,
2.10974155069583, 1.96748398497901, 1.96253589573669, 1.96408217362492,
1.94336204992269, 1.91027170311465, 1.93501214932626, 1.98542080848244,
1.90934393638171, 1.98201899712834, 1.92542522641926, 1.88831455710183,
1.83079301965982, 1.80017671747294, 1.79955820631765)), row.names = c(NA,
-100L), class = "data.frame")
I will be really thankful for a solution.
nls itself has a subset argument, e.g. using the built-in CO2 data.frame this uses only the first 10 rows:
nls(uptake ~ a + b * conc, CO2, start = list(a = 0, b = 1), subset = 1:10)
ADDED
Regarding the change in question to fully present it, the problems are
zz should not be part of the formula
better starting values are needed
c(7:37) is the same as 7:37. The c is superfluous.
Remove zz and use the result of the full optimization to start the subset problem:
fm0 <- nls(tuReMa ~ bm6(Time, t0=30, tau=10, a, l, p), data=zz,
start=list(a=0.01, l=0.01, p=0.1));
fm <- nls(tuReMa~bm6(Time, t0=30, tau=10, a, l, p), data=zz,
start=coef(fm0), subset = 7:37)
fm
giving:
Nonlinear regression model
model: tuReMa ~ bm6(Time, t0 = 30, tau = 10, a, l, p)
data: zz
a l p
0.014206 0.007979 0.049172
residual sum-of-squares: 1.678
Number of iterations to convergence: 23
Achieved convergence tolerance: 9.615e-06

Overlaying a histogram with normal distribution

I neeed to draw a histogram of data (weights) overlayed with a line of the expected normal distributioh
I am totally new to R and statistics. I know I probably got something fundamentally wrong about frequencies density and dnorm, but I am stuck.
weights <- c(97.6,95,94.3 ,92.3 ,90.7 ,89.4 ,88.2 ,86.9 ,85.8 ,85.5 ,84.4 ,84.1 ,82.5 ,81.4 ,80.8 ,80 ,79.8 ,79.5 ,78.4 ,78.4 ,78.2 ,78.1 ,78 ,77.4 ,76.5 ,75.4 ,74.8 ,74.1 ,73.5 ,73.2 ,73 ,72.3 ,72.3 ,72.2 ,71.8 ,71.7 ,71.6 ,71.6 ,71.5 ,71.3 ,70.7 ,70.6 ,70.5 ,69.2 ,68.6 ,68.3 ,67.5 ,67 ,66.8 ,66.6 ,65.8 ,65.6 ,64.9 ,64.6 ,64.5 ,64.5 ,64.3 ,64.2 ,63.9 ,63.7 ,62.7 ,62.3 ,62.2 ,59.4 ,57.8 ,57.8 ,57.6 ,56.4 ,53.6 ,53.2 )
hist(weights)
m <- mean(weights)
sd <- sd(weights)
x <- seq(min(weights), max(weights), length.out length(weights))
xn <- dnorm(x, mean = m, sd = sd) * length(weights) #what is the correct factor???
lines(x, xn)
I expected the line to follow the histogram approximately, but it is too low in the histogram
what you need is to plot the histogram with the frequency of the examples and then plot the density of the weights, i.e.
weights = c(97.6,95,94.3 ,92.3 ,90.7 ,89.4 ,88.2 ,86.9 ,85.8 ,85.5 ,84.4 ,84.1 ,82.5 ,81.4 ,80.8 ,80 ,79.8 ,79.5 ,78.4 ,78.4 ,78.2 ,78.1 ,78 ,77.4 ,76.5 ,75.4 ,74.8 ,74.1 ,73.5 ,73.2 ,73 ,72.3 ,72.3 ,72.2 ,71.8 ,71.7 ,71.6 ,71.6 ,71.5 ,71.3 ,70.7 ,70.6 ,70.5 ,69.2 ,68.6 ,68.3 ,67.5 ,67 ,66.8 ,66.6 ,65.8 ,65.6 ,64.9 ,64.6 ,64.5 ,64.5 ,64.3 ,64.2 ,63.9 ,63.7 ,62.7 ,62.3 ,62.2 ,59.4 ,57.8 ,57.8 ,57.6 ,56.4 ,53.6 ,53.2 )
hist(weights, prob = T)
lines(density(weights), col = "red")
Hope this helps.
The problem in your code is that hist plots frequencies and dnorm calculates densities.
You can try making a histogram with densities and then you will see the histogram or the line just adding freq=F to the histogram:
hist(weights, freq = F)
You're nearly there, you just have to factor in the histogram bin widths.
weights <- c(97.6, 95, 94.3, 92.3, 90.7, 89.4, 88.2, 86.9, 85.8,
85.5, 84.4, 84.1, 82.5, 81.4, 80.8, 80, 79.8, 79.5, 78.4, 78.4,
78.2, 78.1, 78, 77.4, 76.5, 75.4, 74.8, 74.1, 73.5, 73.2, 73,
72.3, 72.3, 72.2, 71.8, 71.7, 71.6, 71.6, 71.5, 71.3, 70.7,
70.6, 70.5, 69.2, 68.6, 68.3, 67.5, 67, 66.8, 66.6, 65.8, 65.6,
64.9, 64.6, 64.5, 64.5, 64.3, 64.2, 63.9, 63.7, 62.7, 62.3,
62.2, 59.4, 57.8, 57.8, 57.6, 56.4, 53.6, 53.2)
h <- hist(weights, freq=TRUE)
binwi <- diff(h$breaks)[1]
x <- seq(min(weights)-10, max(weights)+10, 0.01)
xn <- dnorm(x, mean=mean(weights), sd=sd(weights)) * length(weights) * binwi
lines(x, xn)

How can I get standard errors for my 4 parameters when the Hessian matrix from solnp is 5 by 5?

I'm using the solnp() function in the R package Rsolnp to solve a nonlinear regression with constraints. It works well, converges with no problem. I want to use the Hessian matrix to calculate standard errors of the four parameter estimates, but the Hessian is not 4 by 4 as I had expected, but 5 by 5. I looked around on SO and didn't see anyone else with an unexpected Hessian size. All the examples I found with the Hessians printed showed them to be the expected size of p by p (e.g., 2x2, 3x3, and 4x4).
How can I get standard errors for my 4 parameters from this 5 by 5 Hessian?
df <- data.frame(
Recruit.N = c(78.4, 79.8, 106, 57.4, 81.7, 94.4, 74.1, 42, 61.6, 47.7, 61.8,
28.1, 32.3, 19, 23.4, 20.1, 27),
Stock.5 = c(66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4, 81.7, 94.4,
74.1, 42, 61.6, 47.7, 61.8, 28.1),
Stock.6 = c(25.2, 66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4, 81.7,
94.4, 74.1, 42, 61.6, 47.7, 61.8),
Stock.7 = c(23.8, 25.2, 66.6, 90.3, 138.5, 79.8, 77.3, 78.4, 79.8, 106, 57.4,
81.7, 94.4, 74.1, 42, 61.6, 47.7)
)
lossfcn <- function(parz, mydat) {
alpha <- parz[[1]]
beta <- parz[[2]]
p5 <- parz[[3]]
p6 <- parz[[4]]
p7 <- 1 - p5 - p6
S <- with(mydat, p5*Stock.5 + p6*Stock.6 + p7*Stock.7)
Obs <- mydat$Recruit.N
Pred <- alpha * S * exp(-beta*S)
Resid <- log(Obs) - log(Pred)
sigma <- sqrt(mean(Resid^2))
LL <- dlnorm(Obs, meanlog=log(Pred), sdlog=sigma, log=TRUE)
-sum(LL)
}
inequal <- function(parz, mydat) {
parz[3] + parz[4]
}
library(Rsolnp)
solnp(pars=c(1, 0.008, 1/3, 1/3), fun=lossfcn, mydat=df,
ineqfun=inequal, ineqLB=0, ineqUB=1,
LB=c(0, 0, 0, 0), UB=c(1000, 1000, 1, 1), control=list(trace=0))
$pars
[1] 6.731317e-01 1.888572e-10 8.141363e-01 1.858631e-01
$convergence
[1] 0
$values
[1] 79.87150 75.50927 75.50927 75.50927
$lagrange
[,1]
[1,] -2.028222
$hessian
[,1] [,2] [,3] [,4] [,5]
[1,] 0.3350868 -3.359077e-01 17.84919 -0.4306057 -0.3382811
[2,] -0.3359077 1.993956e+02 -10161.63351 -7.0844295 -2.2749785
[3,] 17.8491854 -1.016163e+04 548099.69224 -85.9544831 -224.0362766
[4,] -0.4306057 -7.084429e+00 -85.95448 25.1086694 5.8817704
[5,] -0.3382811 -2.274979e+00 -224.03628 5.8817704 4.1978178
$ineqx0
[1] 0.9999995
$nfuneval
[1] 142
$outer.iter
[1] 3
$elapsed
Time difference of 0.03016496 secs
$vscale
[1] 1 1 1 1 1 1
Unlike the 3 posts you linked, you have an inequality constraint. Check the ineqx0 in the returned values: the other posts have NULL but you have 0.9999995. With an inequality constraint, there is a slack variable, so the problem is augmented. The hessian matrix returned is for this augmented set of parameters. Just retain the first 4 x 4 submatrix of the hessian for your wanted parameters.

R: How to or should I drop an insignificant orthogonal polynomial basis in a linear model?

I have soil moisture data with x-, y- and z-coordinates like this:
gue <- structure(list(x = c(311939.1507, 311935.4607, 311924.7316, 311959.553,
311973.5368, 311953.3743, 311957.9409, 311948.3151, 311946.7169,
311997.0803, 312017.5236, 312006.0245, 312001.5179, 311992.7044,
311977.3076, 311960.4159, 311970.6047, 311957.2564, 311866.4246,
311870.8714, 311861.4461, 311928.7096, 311929.6291, 311929.4233,
311891.2915, 311890.3429, 311900.8905, 311864.4995, 311870.8143,
311866.9257, 312002.571, 312017.816, 312004.5024, 311947.1186,
311943.0152, 311952.2695, 311920.6095, 311929.8371, 311918.6095,
312011.9019, 311999.5755, 312011.1461, 311913.7251, 311925.3459,
311944.4701, 311910.2079, 311908.7618, 311896.0776, 311864.4814,
311856.9027, 311857.5747, 311967.3779, 311962.2024, 311956.8318,
311977.5254, 311971.1776, 311982.537, 311993.4709, 312004.6407,
312015.6118, 311990.8601, 311994.686, 311988.3037, 311990.518,
311986.3918, 311998.8876, 311923.9157, 311903.4563, 311915.714,
311856.9087, 311858.9812, 311874.5867, 311963.9099, 311938.4542,
311945.9505, 311804.3039, 311797.2571, 311791.6967, 311921.3965,
311928.9353, 311920.0597, 311833.5109, 311829.8683, 311847.6261,
311889.1243, 311902.4909, 311901.245, 311981.1118, 312005.7098,
311976.5858, 311819.8901, 311816.4143, 311819.4172, 311870.418,
311873.2656, 311888.3401, 311910.8377, 311897.6697, 311902.4571,
311846.8196, 311833.6235, 311846.2942, 311931.3916, 311930.1891,
311947.659, 311792.2642, 311793.2539, 311794.1931, 311795.1288,
311796.0806, 311797.0142, 311797.95, 311798.8822, 311799.8229,
311800.7774, 311801.7094, 311802.6395, 311803.583, 311804.5185,
311805.4558, 311806.391, 311807.3346, 311808.2757, 311809.2187,
311810.1549, 311811.1014, 311812.0366, 311812.9667, 311813.9107,
311814.8373, 311815.7777, 311816.7365, 311817.6522, 311818.6091,
311819.5335, 311820.4961, 311821.4337, 311822.3855, 311823.3195,
311824.2713, 311825.214, 311826.1705, 311827.1188, 311828.0501,
311828.9893, 311829.9324, 311830.8706, 311831.8181, 311832.7667,
311833.705, 311834.6546, 311835.609, 311836.5527, 311837.5157,
311838.4495, 311839.3926, 311840.3423, 311841.2799, 311842.2288,
311843.1691, 311844.118, 311845.0746, 311846.019, 311846.9709,
311847.9201, 311848.859, 311849.8105, 311850.7503, 311851.6889,
311852.6355, 311853.6045, 311854.5296, 311855.4717, 311856.4171,
311857.3759, 311858.3151, 311859.2604, 311860.2178, 311861.1636,
311862.1071, 311863.0347, 311863.9857, 311864.9316, 311865.8722,
311866.8158, 311867.7702, 311868.7155, 311869.649, 311870.6018,
311871.5449, 311872.4871, 311873.4352, 311874.385, 311875.3042,
311876.2617, 311877.2068, 311878.1429, 311879.0956, 311880.0401,
311880.9822, 311881.929, 311882.8651, 311883.8017, 311884.7429,
311885.6949, 311886.6349, 311887.7207, 311888.6653, 311889.6041,
311890.5358, 311891.4838, 311892.4292, 311893.3736, 311894.326,
311895.2703, 311896.2182, 311897.1635, 311898.1032, 311899.0496,
311899.9967, 311900.9456, 311901.8889, 311902.8162, 311903.7566,
311904.6996, 311905.6627, 311906.5899, 311907.5448, 311908.4856,
311909.4399, 311910.3649, 311911.3188, 311912.2629, 311913.2022,
311914.1527, 311915.1025, 311916.0425, 311916.985, 311917.9254,
311918.8661, 311919.8174, 311920.7668, 311921.7026, 311922.6517,
311923.5949, 311924.5252, 311925.4599, 311926.422, 311927.3646,
311928.3, 311929.2432, 311930.1796, 311931.1358, 311932.0726,
311933.0069, 311933.9585, 311934.845, 311935.7788, 311936.7193,
311937.6441, 311938.572, 311939.5094, 311940.4666, 311941.4067,
311942.3489, 311943.2712, 311944.2195, 311945.1536, 311946.0927,
311947.0413, 311947.9761, 311948.9082, 311949.8557, 311950.8201,
311951.7616, 311952.7148, 311953.7894, 311954.7289, 311955.6646,
311956.6081, 311957.5588, 311958.4896, 311959.4297, 311960.3761,
311961.3191, 311962.2653, 311963.195, 311964.1501, 311965.0856,
311966.0254, 311966.9739, 311967.9305, 311968.8592, 311971.7861,
311970.758, 311969.8205), y = c(5846548.408, 5846546.489, 5846538.014,
5846525.283, 5846510.302, 5846503.516, 5846529.769, 5846523.06,
5846522.742, 5846512.263, 5846525.347, 5846522.042, 5846537.487,
5846545.587, 5846532.112, 5846425.917, 5846406.543, 5846434.03,
5846500.989, 5846498.286, 5846487.134, 5846488.045, 5846483.29,
5846468.713, 5846534.269, 5846533.527, 5846504.056, 5846453.395,
5846438.43, 5846442.608, 5846406.8, 5846434.58, 5846419.229,
5846441.045, 5846436.903, 5846447.917, 5846460.757, 5846457.428,
5846451.067, 5846445.596, 5846474.031, 5846457.239, 5846532.694,
5846553.938, 5846565.323, 5846446.926, 5846432.549, 5846467.236,
5846473.963, 5846464.78, 5846498.142, 5846458.168, 5846474.018,
5846489.801, 5846559.513, 5846589.975, 5846555.723, 5846553.847,
5846560.066, 5846560.792, 5846455.642, 5846546.374, 5846465.999,
5846432.091, 5846422.061, 5846442.871, 5846485.956, 5846472.811,
5846506.756, 5846416.327, 5846419.623, 5846413.124, 5846587.334,
5846600.116, 5846589.515, 5846463.69, 5846456.712, 5846459.683,
5846600.118, 5846574.99, 5846597.804, 5846419.496, 5846437.615,
5846436.902, 5846567.872, 5846572.857, 5846556.904, 5846388.146,
5846393.088, 5846390.13, 5846481.09, 5846496.127, 5846493.586,
5846545.396, 5846532.126, 5846538.334, 5846388.343, 5846416.117,
5846392.223, 5846513.526, 5846486.644, 5846512.917, 5846395.509,
5846386.421, 5846383.873, 5846459.062, 5846459.36, 5846459.682,
5846460.026, 5846460.377, 5846460.703, 5846461.047, 5846461.378,
5846461.73, 5846462.071, 5846462.418, 5846462.765, 5846463.115,
5846463.466, 5846463.815, 5846464.128, 5846464.505, 5846464.843,
5846465.189, 5846465.52, 5846465.869, 5846466.217, 5846466.557,
5846466.893, 5846467.237, 5846467.586, 5846467.903, 5846468.274,
5846468.601, 5846468.943, 5846469.258, 5846469.592, 5846469.909,
5846470.247, 5846470.565, 5846470.891, 5846471.24, 5846471.536,
5846471.885, 5846472.224, 5846472.553, 5846472.884, 5846473.225,
5846473.532, 5846473.89, 5846474.179, 5846474.502, 5846474.827,
5846475.146, 5846475.448, 5846475.768, 5846476.102, 5846476.428,
5846476.746, 5846477.069, 5846477.37, 5846477.685, 5846478.009,
5846478.335, 5846478.656, 5846478.958, 5846479.299, 5846479.608,
5846479.926, 5846480.267, 5846480.603, 5846480.908, 5846481.246,
5846481.56, 5846481.877, 5846482.19, 5846482.503, 5846482.825,
5846483.144, 5846483.468, 5846483.811, 5846484.13, 5846484.458,
5846484.8, 5846485.125, 5846485.456, 5846485.778, 5846486.112,
5846486.421, 5846486.75, 5846487.08, 5846487.401, 5846487.744,
5846488.067, 5846488.39, 5846488.728, 5846489.067, 5846489.383,
5846489.716, 5846490.054, 5846490.38, 5846490.719, 5846491.044,
5846491.357, 5846491.694, 5846492.005, 5846492.402, 5846492.726,
5846493.045, 5846493.389, 5846493.708, 5846494.049, 5846494.363,
5846494.686, 5846494.982, 5846495.3, 5846495.64, 5846495.957,
5846496.263, 5846496.584, 5846496.911, 5846497.241, 5846497.591,
5846497.914, 5846498.226, 5846498.553, 5846498.893, 5846499.221,
5846499.538, 5846499.869, 5846500.19, 5846500.508, 5846500.82,
5846501.151, 5846501.492, 5846501.827, 5846502.147, 5846502.471,
5846502.803, 5846503.129, 5846503.46, 5846503.783, 5846504.11,
5846504.448, 5846504.76, 5846505.118, 5846505.445, 5846505.79,
5846506.106, 5846506.465, 5846506.795, 5846507.118, 5846507.448,
5846507.758, 5846508.081, 5846508.396, 5846508.645, 5846508.99,
5846509.34, 5846509.685, 5846510.031, 5846510.363, 5846510.693,
5846511.031, 5846511.362, 5846511.694, 5846512.024, 5846512.354,
5846512.701, 5846513.034, 5846513.353, 5846513.683, 5846513.998,
5846514.32, 5846514.636, 5846514.956, 5846515.326, 5846515.65,
5846515.968, 5846516.301, 5846516.634, 5846516.971, 5846517.318,
5846517.64, 5846517.952, 5846518.308, 5846518.626, 5846518.937,
5846519.27, 5846519.597, 5846519.921, 5846520.245, 5846520.581,
5846521.498, 5846521.209, 5846520.893), z = c(26.485, 26.411,
26.339, 27.248, 27.208, 26.799, 27.199, 27.023, 26.973, 26.908,
26.275, 26.474, 26.316, 26.226, 27.184, 25.903, 25.765, 25.931,
26.057, 26.181, 26.102, 26.436, 26.457, 26.396, 25.585, 25.572,
26.448, 25.637, 25.603, 25.634, 25.847, 26.185, 25.899, 26.016,
25.873, 26.299, 26.358, 26.344, 26.088, 26.264, 26.3, 26.306,
26.311, 25.857, 26.004, 25.824, 25.798, 26.326, 26.03, 25.625,
25.78, 26.368, 26.225, 26.582, 26.398, 25.343, 26.253, 25.908,
25.323, 25.381, 26.3, 26.179, 26.284, 26.024, 25.896, 26.251,
26.447, 26.385, 26.419, 25.188, 25.176, 25.169, 25.348, 25.188,
25.291, 25.285, 25.266, 25.262, 25.333, 25.308, 25.314, 25.145,
25.172, 25.22, 25.235, 25.204, 25.286, 25.155, 25.397, 25.202,
25.373, 25.327, 25.341, 25.172, 25.253, 25.318, 25.023, 25.24,
25.132, 25.264, 25.38, 25.221, 25.119, 25.179, 25.083, 25.258,
25.254, 25.235, 25.252, 25.266, 25.256, 25.264, 25.26, 25.262,
25.265, 25.265, 25.285, 25.28, 25.257, 25.254, 25.258, 25.287,
25.294, 25.282, 25.27, 25.268, 25.309, 25.303, 25.3, 25.312,
25.305, 25.3, 25.314, 25.319, 25.328, 25.304, 25.325, 25.308,
25.332, 25.333, 25.333, 25.346, 25.344, 25.339, 25.355, 25.362,
25.36, 25.391, 25.418, 25.434, 25.436, 25.447, 25.486, 25.5,
25.526, 25.552, 25.551, 25.564, 25.589, 25.606, 25.641, 25.672,
25.689, 25.709, 25.736, 25.758, 25.782, 25.836, 25.844, 25.866,
25.88, 25.935, 25.984, 26.037, 26.066, 26.071, 26.094, 26.106,
26.106, 26.118, 26.1, 26.146, 26.135, 26.156, 26.169, 26.162,
26.173, 26.198, 26.196, 26.228, 26.258, 26.276, 26.283, 26.277,
26.236, 26.277, 26.251, 26.264, 26.26, 26.261, 26.249, 26.307,
26.289, 26.243, 26.206, 26.231, 26.224, 26.238, 26.244, 26.245,
26.254, 26.2, 26.229, 26.24, 26.248, 26.223, 26.29, 26.344, 26.371,
26.364, 26.311, 26.343, 26.342, 26.334, 26.317, 26.342, 26.315,
26.312, 26.322, 26.325, 26.324, 26.32, 26.308, 26.329, 26.31,
26.32, 26.327, 26.34, 26.371, 26.442, 26.442, 26.483, 26.504,
26.526, 26.562, 26.562, 26.538, 26.534, 26.533, 26.541, 26.584,
26.642, 26.65, 26.691, 26.719, 26.755, 26.786, 26.794, 26.849,
26.867, 26.919, 26.93, 26.945, 26.947, 26.959, 26.984, 26.992,
27.006, 27.035, 27.021, 27.052, 27.094, 27.104, 27.119, 27.16,
27.182, 27.223, 27.236, 27.267, 27.304, 27.331, 27.348, 27.341,
27.379, 27.355, 27.378, 27.357, 27.373, 27.319, 27.299, 27.278,
27.28, 27.295, 27.288, 27.286, 27.279), soil_m_sat = c(24.1,
24.2, 26.9, 13.9, 20.6, 34.1, 16.2, 16.7, 16, 22.1, 23.9, 27.2,
26.8, 34.4, 26.3, 54.1, 51, 44.9, 46.4, 45.9, 54.7, 39.1, 38.7,
40.7, 56.5, 56.3, 40.6, 60.9, 56.8, 56.3, 40.7, 40.4, 44.1, 44.9,
46.2, 45.3, 46.1, 43.7, 44.9, 45.4, 33.1, 45.8, 27.6, 47.8, 37.3,
58.9, 51.4, 42.1, 46, 66.6, 51.1, 31.6, 48.7, 32.9, 28.1, 84,
37.7, 38.2, 80.4, 73.3, 35.6, 44.2, 39.7, 50.2, 49.9, 37.8, 37,
41.7, 27.3, 100, 100, 100, 80.9, 100, 88.4, 89.6, 93.8, 95.3,
91.9, 93.9, 96.1, 91.4, 100, 94.4, 100, 100, 80, 94.1, 84.4,
91.1, 80, 78.9, 85.9, 100, 97.5, 87.2, 88.6, 83.3, 90.7, 100,
82.2, 100, 96.3, 93.3, 99.6, 92.1, 92.8, 90.9, 92.3, 91.2, 94.5,
91.8, 89.4, 87, 86, 88, 83.7, 88.8, 92.9, 89.3, 83.3, 83.5, 84.5,
85.8, 87.4, 86.5, 82, 78.1, 85.8, 85.6, 88.7, 87.7, 84.9, 82,
87.9, 85.5, 86, 82, 83, 88.5, 81.2, 81.6, 76.5, 77.6, 84.5, 81.5,
82, 82.4, 68, 67.7, 62.1, 68.9, 61.7, 68.5, 68.6, 65.3, 59.5,
60.8, 67.3, 66.2, 59.9, 50.9, 46.9, 44.6, 47.9, 53, 52.1, 48.3,
41.3, 53.8, 51, 47, 53.7, 49.5, 51.1, 44.4, 35.1, 42.2, 41.5,
40, 48.2, 46.7, 48.6, 51.7, 51.2, 52.3, 53.4, 48.9, 50.7, 48.5,
46.5, 39.4, 38, 49.2, 43.6, 47.1, 40.4, 44.7, 45.7, 38.1, 41.9,
39.3, 40.2, 43.8, 47.3, 50.1, 41.2, 39.8, 46, 40.8, 40, 37.8,
42.6, 46, 43.8, 45.4, 42.2, 46.5, 40.4, 39.9, 53, 44.7, 35.8,
42.9, 43.9, 43.2, 40.6, 40.8, 32.2, 32.6, 33.5, 36.7, 34.6, 34.7,
50.9, 35.6, 34.2, 28.1, 42, 32, 42.3, 30, 29.6, 31, 29.8, 26,
37.8, 40, 37, 30.2, 28.2, 26.2, 27.4, 22.1, 28.4, 23.2, 24.8,
26.5, 23.9, 21.1, 27.2, 20.8, 12.5, 14, 17.9, 19.7, 19.4, 26,
16.7, 18.2, 23.9, 19, 25.9, 24.4, 22.1, 19.2, 18.4, 24.7, 17.3,
19.4, 19.6, 17.7, 21.3, 22.1, 17.9, 28.2, 16.3, 25.3, 19.7, 21.7,
19, 18.8, 11.8, 15.6, 9.8, 17.7)), .Names = c("x", "y", "z",
"soil_m_sat"), class = "data.frame", row.names = c(NA, -296L))
In order to estimate a variogram for this data I need to remove the spatial trend from it. The soil moisture, of course, varies with the surface - the higher a point is the dryer it is. And since this soil moisture data is percetagewise the relationship is hardly linear, what leads me to allow up to cubic dependencies of the soil moisture to the z-coordinate. It happens that in this area there is a small more or less elliptic elevation, so that I want to allow the soil moisture to be dependend of the x- and y-coordinates in a quadratic way. I hope the following model does exactly this:
polymod <- lm(soil_m_sat ~ poly(x + y, degree = 2) + poly(z, degree = 3), data = gue)
summary(polymod)
The summary shows me that there is no significance for the first coefficient of the x- and y-dependency (what summary names poly(x + y, degree = 2)1). Because the help page from poly() told me that it "returns or evaluates orthogonal polynomials of degree 1 to degree", I thought, removing a degree one polynom from the model might be the same as removing the first coefficient of the degree 2 polynom. Therefore I tried to remove it like this:
mod <- lm(soil_m_sat ~ poly(x + y, degree = 2) - poly(x + y, degree = 1) + poly(z, degree = 3), data = gue)
summary(mod)
But the summary of mod looks exactly the same as the summary of polymod, meaning mod does not differ from polymod. How is it possible to remove the unsignificant component then?
No, don't check with summary in this case. You should use anova. A polynomial term from poly(), or a spline term from bs() contains more than coefficients, so they are more like a factor variable with multiple levels.
> anova(polymod)
Analysis of Variance Table
Response: soil_m_sat
Df Sum Sq Mean Sq F value Pr(>F)
poly(x + y, degree = 2) 2 113484 56742 1600.8 < 2.2e-16 ***
poly(z, degree = 3) 3 68538 22846 644.5 < 2.2e-16 ***
Residuals 290 10280 35
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The ANOVA table clearly shows that you need all model terms. Do not drop any.
But I still need to answer your question and make you feel happy.
It is not impossible to drop the poly(x + y, degree = 2)1 term, but you need to access model matrix for such purpose. You may do
gue$XY_poly <- with(gue, poly(x + y, degree = 2))[, 2] ## use the 2nd column only
fit <- lm(soil_m_sat ~ XY_poly + poly(z, degree = 3), data = gue)
summary(fit)
## ...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.3071 0.3459 151.217 < 2e-16 ***
XY_poly -18.8515 7.3894 -2.551 0.0112 *
poly(z, degree = 3)1 -418.1634 6.4937 -64.395 < 2e-16 ***
poly(z, degree = 3)2 116.5327 6.9171 16.847 < 2e-16 ***
poly(z, degree = 3)3 -28.7773 5.9517 -4.835 2.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.951 on 291 degrees of freedom
Multiple R-squared: 0.9464, Adjusted R-squared: 0.9457
F-statistic: 1285 on 4 and 291 DF, p-value: < 2.2e-16

ks.test with left truncated weibull

The Distribution I accept my values to follow is a left truncated weibull distribution. I do know the parameters a, shape and scale of this distribution for using the ptrunc command:
require(truncdist);
ptrunc(x,"weibull",a=a,scale=b,shape=c)
so I want the ks.test command (see below) to use the described left truncated weibull distribution instead of the "normal weibull".
myvalues<-c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4);
a<-7;
scale<-36.37516;
shape<-9.437013;
So I do know, that in this case it is not necessary to do the left-side truncation. But in others it will be.
ks.test(myvalues,"pweibull",scale=b,shape=c) #for normal weibull
but
ks.test(myvalues,ptrunc(x,"weibull",a=a,scale=b,shape=c)) # for leftruncated
gives a wrong result.
First of all, ptrunc should be replaced by rtrunc. ptrunc gives a vector of probability values. But by the documentation of ks.test we need a sample, and this is what rtrunc gives us. If the argument a of rtrunc is set to -Inf, there is no truncation and the result with a=-Inf is indeed the same as with a=7:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
a <- 7
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myvalues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myvalues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.21429, p-value = 0.2898
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.21429, p-value = 0.2898
alternative hypothesis: two-sided
But still the result of ks.test( myvalues, "pweibull",scale=scale,shape=shape ) is different:
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
The reason is that myvalues is too small. If we make it larger in the call of rtrunc (not ks.test), ks0, ks1, and ks2 are almost the same:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
myManyValues <- c(outer((0:9999)/100000,myvalues,"+"))
a <- 7
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myManyValues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myManyValues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
Now let's see what happens when we do truncate the distribution:
library(truncdist)
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4)
myManyValues <- c(outer((0:9999)/100000,myvalues,"+"))
a <- 29
scale<-36.37516
shape <- 9.437013
set.seed(1)
y1 <- rtrunc(myManyValues,"weibull",a=-Inf,scale=scale,shape=shape)
set.seed(1)
y2 <- rtrunc(myManyValues,"weibull",a=a,scale=scale,shape=shape)
set.seed(1)
ks0 <- ks.test( myvalues, "pweibull",scale=scale,shape=shape )
set.seed(1)
ks1 <- ks.test( myvalues, y1 )
set.seed(1)
ks2 <- ks.test( myvalues, y2 )
.
> ks0
One-sample Kolmogorov-Smirnov test
data: myvalues
D = 0.15612, p-value = 0.2576
alternative hypothesis: two-sided
> ks1
Two-sample Kolmogorov-Smirnov test
data: myvalues and y1
D = 0.15655, p-value = 0.2548
alternative hypothesis: two-sided
> ks2
Two-sample Kolmogorov-Smirnov test
data: myvalues and y2
D = 0.2059, p-value = 0.05683
alternative hypothesis: two-sided
You are using the ptrunc function incorrectly (I assume), it needs to be fed a sequence of quantiles. Below I calculate the mean and standard deviation of the Weibull based on your scale and shape parameters, then sample from 5 standard deviations above and below to produce a comparison set.
require(truncdist);
myvalues <- c(37.5, 35.4, 27.1, 32.9, 35.9, 35.1, 34.1, 32.5, 35.5, 31.5, 38.2, 36.1,29.9, 30.1, 34.7, 38.7 ,32.3, 38.0, 34.9, 44.2, 35.8, 30.8, 39.3, 26.0, 34.2, 40.0, 36.1 ,41.5 ,32.8, 31.9, 41.3 ,30.5, 39.9, 35.0 ,31.2 ,35.0, 30.3, 29.0, 34.4, 35.7, 34.1, 35.4);
a <- 7;
scale <- 36.37516;
shape <- 9.437013;
# Calculate standard deviation of the weibull
weib_mean <- scale * gamma(1 + 1/shape)
weib_sd <- sqrt((scale^2) * (gamma(1 + 2/shape) - (gamma(1 + 1/shape))^2))
# Get a sample
quant <- seq(weib_mean - 5 * weib_sd, weib_mean + 5 * weib_sd, length.out = 1E5)
weibull_samp <- ptrunc(quant, "weibull", a = a, scale = scale, shape = shape)
# Take a look
plot(weibull_samp ~ quant)
# Use with test
> ks.test(sort(myvalues), weibull_samp)
Two-sample Kolmogorov-Smirnov test
data: sort(myvalues) and weibull_samp
D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided

Resources