How are residuals of aov() calculated? - r

I'm wondering how residuals in aov() are calculated. I looked already for hours but can't figure it out.
I use an ANOVA for repeated measurements.
Data <- data.frame(subject = factor(rep(1:10, 3)),
age = factor(c(rep(4, 10),
rep(10, 10),
rep(35, 10))),
weight = c(20, 9, 16, 14, 30, 26, 26, 27, 13, 15,
27, 18, 30, 26, 43, 48, 38, 38, 22, 47,
50, 44, 52, 46, 64, 70, 73, 57, 54, 63))
ANOVA_MW <- aov(weight ~ age +
Error(subject / age),
data = Data)
summary(ANOVA_MW)
I know that the following command gives me something.
round(ANOVA_MW$subject:age$residuals, 2)
However, I get only 20 rather than 30 values. It starts with 11. This has propably something to do with the residuals of subject. I don't know.
The result of proj(ANOVA_MW) gives me the residuals that I calculated manually (value - personal mean - group mean + overall mean).
My question is, what are the other residuals above and why is everybody (so it feels) using them for normality testing?
I would love some helpful input. I already dove into the function but could not find an explanation.
Thanks.

residual sum of square = total sum of square - Factor sum of squares
In your case, factor is age.
The residuals should be normally distributed, it is one of the assumption of ANOVA.

Related

Understadning Krige.bayes() output

I am struggling with the Krige.bayes() function in the GeoR package. I was hoping to create a map with my output from the function but I can't seem to find a way to do this. The online pdf (https://cran.r-project.org/web/packages/geoR/geoR.pdf) of the geoR package indicates that you can make an image using geoR::image.kriging however I get the error code 'image.kriging' is not an exported object from 'namespace:geoR' when I do this. When using ls("package:geoR") this function does not appear indicating that it has been depriocated and just not taken off the package information. This leaves me with just the output from the Krige.bayes() function and some variomodels that I have created as well. I can see that I can modify the output output.control, however I'm not sure what I can change in there to make the output more comprehendable to me. The output thaty I am getting from the Krige.bayes() is as follows.
Only samples of the posterior for the parameters will be returned.
krige.bayes: computing the discrete posterior of phi/tausq.rel
krige.bayes: argument `phi.discrete` not provided, using default values
krige.bayes: computing the posterior probabilities.
Number of parameter sets: 50
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50
krige.bayes: sampling from posterior distribution
krige.bayes: sample from the (joint) posterior of phi and tausq.rel
[,1] [,2] [,3]
phi 1.869439e-03 0.003738877 0.007477755
tausq.rel 0.000000e+00 0.000000000 0.000000000
frequency 9.940000e+02 5.000000000 1.000000000
Am I misunderstnading this output, the next step or somethign else? Thanks in advance for the help

Sensitivity analysis for ODE with list as parameters - result gives standard deviation of 0

Note: Initial problem was "Sensitivity analysis for ODE with parameters that include lists", as the sensRange-function gave an error due to the lists passed in the parameters. The question evolved as the list-parameters were fixed but a different problem where the sensitivity analysis showed strange results with a standard deviation of 0 for all runs.
I have a model simulating the concentrations of a chemical over time in R using the deSolve package. The parameters I use are the chemical properties (volume of distribution, halflife etc) as well as weight over time. Weight is given in a list which iterates over each time step in the ODE due to weight increase over time, and the chemical properties are given as a numeric.
I would like to perform a sensitivity analysis for this model, and only test the chemical properties. I have not done a sensitivity analysis before, but I am trying to follow examples using sensRange(). However, it doesn't seem like the sensRange() function allows that one of the parameters is given as a list. I get the error:
Error in yRef[, ivar] : invalid subscript type 'list'
My code for the model and global sensitivity analysis is set up like this:
library(FME)
library(deSolve)
c.weight <- c(3.5, 4, 5, 5, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35, 35, 35, 36, 36, 36, 37, 37, 37, 38, 38, 38, 39, 39, 39, 40, 40, 40, 41, 41, 41, 42, 42, 42, 43, 43, 43, 44, 44, 44, 45, 45, 45, 46, 46, 46, 47, 47, 47, 48, 48, 48, 49, 49, 49, 50, 50, 50, 51, 51, 52, 54, 55)
params.sens <- c(vd = 0.2,
pm = 0.05,
halflife = 5,
dose = 0.001,
c.weight = c.weight)
solve.sens <- function(pars) {
sens.model <- function(times, state, parameters) {
with(as.list(c(state, parameters)), {
if (times <= 5) {
volume <- (((0.2 * times) * c.weight[times+1]) * 30)
transferred <- pm * volume
} else if (times > 5 & times <= 14) {
volume <- ((0.1 * c.weight[times+1]) * 30)
transferred <- pm * volume
} else {
transferred <- 0
}
intake.c <- (dose * c.weight[times+1])
elimination.c <- concentration * vd * c.weight[times+1] * log(2) / (halflife * 12)
#total
concentration <- (intake.c + transferred - elimination.c) / (vd * c.weight[times+1])
list(c(concentration))
})
}
state <- c(concentration = 0.5)
months <- seq(0, 144, 1)
return(as.data.frame(ode(y = state, times = months, func = sens.model, parms = params.sens)))
}
out <- solve.sens(params.sens)
parRanges <- data.frame(min = c(0.02, 0.1, 2.1), max = c(0.09, 0.2, 5.5))
rownames(parRanges) <- c("pm", "vd", "halflife")
sens <- sensRange(func = solve.sens, parms = params.sens, dist = "latin", sensvar = "concentration", parRange = parRanges, num = 50)
head(summary(sens))
summ.sens <- summary(sens)
plot(summ.sens, xlab = "months", ylab = "concentration")
I don't know how to go forward, does anyone have any tips or see where my mistake is??
Edit: followed the Bacterial growth model from Soetart and Herman, 2009, to correct my =/<- errors and added the parameter values from the comments into the model. Now it runs with no error however the summary shows identical values for all (mean, min, max, and all quantiles) so I am assuming it is not running correctly
x Mean Sd Min Max q05 q25 q50 q75 q95
concentration0 0 0.500000 0 0.500000 0.500000 0.500000 0.500000 0.500000 0.500000 0.500000
concentration1 1 1.246348 0 1.246348 1.246348 1.246348 1.246348 1.246348 1.246348 1.246348
concentration2 2 3.475493 0 3.475493 3.475493 3.475493 3.475493 3.475493 3.475493 3.475493
concentration3 3 7.170403 0 7.170403 7.170403 7.170403 7.170403 7.170403 7.170403 7.170403
concentration4 4 12.314242 0 12.314242 12.314242 12.314242 12.314242 12.314242 12.314242 12.314242
concentration5 5 18.890365 0 18.890365 18.890365 18.890365 18.890365 18.890365 18.890365 18.890365
The original post confused = and <- to create a named vector, so I recommended the following code snippet:
params.sens <- c(vd = p.vd,
pm = p.pm,
halflife = p.hl,
dose = concentrations$dose,
c.weight = variables$c.weight)
After the edit made by the original poster, this answer became mostly obsolete and was deleted. However, it turned out in a later post, that the distinction between <-(assignment) and = (parameter matching; here creation of a named vector) was not yet completely clear.
Here an example that shows the difference. To avoid confusion, run it in a fresh R session or delete the workspace before:
#rm(list=ls()) # uncomment this to clear the work space
x <- c(a <- 2, b <- 3)
y <- c(d = 2, e = 3)
x
y
ls()
where we see that y is a named vector, whereas x has no named elements. Instead a and b were "on the fly" created as variables in the user work space:
> x
[1] 2 3
> y
d e
2 3
> ls()
[1] "a" "b" "x" "y"
Thanks to fixing the assignment mistakes, the script runs now trough and the behavior can be reproduced.
Now we can see that the parameters passed to solve.sens were not passed down to the ode function.
A fix is to replace parms.sens in the ode call with pars, that was passed to the calling function:
return(as.data.frame(ode(y = state, times = months, func = sens.model, parms = pars)))
Then
plot(summ.sens, xlab = "months", ylab = "concentration")
results in:

R, creating a knights tour plot with a matrix indicating the path

I need to create a knight tour plot out of such an exemplary matrix:
Mat = matrix(c(1, 38, 55, 34, 3, 36, 19, 22,
54, 47, 2, 37, 20, 23, 4, 17,
39, 56, 33, 46, 35, 18, 21, 10,
48, 53, 40, 57, 24, 11, 16, 5,
59, 32, 45, 52, 41, 26, 9, 12,
44, 49, 58, 25, 62, 15, 6, 27,
31, 60, 51, 42, 29, 8, 13, 64,
50, 43, 30, 61, 14, 63, 28, 7), nrow=8, ncol=8, byrow=T)
Numbers indicate the order in which knight moves to create a path.
I have a lot of these kind of results with chessboard up to 75 in size, however I have no way of presenting them in a readable way, I found out that R, given the matrix, is capable of creating a plot like this:
link (this one is 50x50 in size)
So for the matrix I presented the lines between two points occur between the numbers like: 1 - 2 - 3 - 4 - 5 - ... - 64, in the end creating a path presented in the link, but for the 8x8 chessboard, instead of 50x50
However, I have a very limited time to learn R good enough to accomplish it, I am desperate for any kind of direction. How hard does creating such code in R, that tranforms any matrix into such plot, is going to be ? Or is it something trivial ? Any code samples would be a blessing
You can use geom_path as described here: ggplot2 line plot order
In order to do so you need to convert the matrix into a tibble.
coords <- tibble(col = rep(1:8, 8),
row = rep(1:8, each = 8))
coords %>%
mutate(order = Mat[8 * (col - 1) + row]) %>%
arrange(order) %>%
ggplot(aes(x = col, y = row)) +
geom_path() +
geom_text(aes(y = row + 0.25, label = order)) +
coord_equal() # Ensures a square board.
You can subtract .5 from the col and row positions to give a more natural chess board feel.

predict with glmer where new data is a Raster Stack of fixed efefcts

I have constructed models in glmer and would like to predict these on a rasterStack representing the fixed effects in my model. my glmer model is in the form of:
m1<-glmer(Severity ~ x1 + x2 + x3 + (1 | Year) + (1 | Ecoregion), family=binomial( logit ))
As you can see, I have random effects which I don't have as spatial layer - for example 'year'. Therefore the problem is really predicting glmer on rasterStacks when you don't have the random effects data random effects layers. If I use it out of the box without adding my random effects I get an error.
m1.predict=predict(object=all.var, model=m1, type='response', progress="text", format="GTiff")
Error in predict.averaging(model, blockvals, ...) :
Your question is very brief, and does not indicated what, if any, trouble you have encountered. This seems to work 'out of the box', but perhaps not in your case. See ?raster::predict for options.
library(raster)
# example data. See ?raster::predict
logo <- brick(system.file("external/rlogo.grd", package="raster"))
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(pa=xy[,1], extract(logo, xy[,2:3])))
v$Year <- sample(2000:2001, nrow(v), replace=TRUE)
library(lme4)
m <- lmer(pa ~ red + blue + (1 | Year), data=v)
# here adding Year as a constant, as it is not a variable (RasterLayer) in the RasterStack object
x <- predict(logo, m, const=(data.frame(Year=2000)))
If you don't have the random effects, just use re.form=~0 in your predict call to predict at the population level:
x <- predict(logo, m, re.form=~0)
works without complaint for me with #RobertH's example (although I don't know if correctly)

Kolmogorov-Smirnov or a Chi-Square test for a distribution?

I used model fitting to fit the negative binomial distribution to my discrete data. As a final step it looks like I need to perform a Kolmogrov-Smirnov test to determine if the model fits the data well. All the references I could find talk about using the test for normally distributed continuous data. Can someone tell me if this can be done in R for data that is not normally distributed and discrete? (Even a chi-square test should do I'm guessing but please correct me if I'm wrong.)
UPDATE:
So I found that the vcd package contains a function goodfit that can be used for this purpose in the following way:
library(vcd)
# Define the data
data <- c(67, 81, 93, 65, 18, 44, 31, 103, 64, 19, 27, 57, 63, 25, 22, 150,
31, 58, 93, 6, 86, 43, 17, 9, 78, 23, 75, 28, 37, 23, 108, 14, 137,
69, 58, 81, 62, 25, 54, 57, 65, 72, 17, 22, 170, 95, 38, 33, 34, 68,
38, 117, 28, 17, 19, 25, 24, 15, 103, 31, 33, 77, 38, 8, 48, 32, 48,
26, 63, 16, 70, 87, 31, 36, 31, 38, 91, 117, 16, 40, 7, 26, 15, 89,
67, 7, 39, 33, 58)
gf <- goodfit(data, type = "nbinomial", method = "MinChisq")
plot(gf)
But after the gf <- ... step, R complains saying:
Warning messages:
1: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced
2: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced
3: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced
and when I say plot it complains:
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' is a list, but does not have components 'x' and 'y'
I am not sure what is happening because if I set data to be the following:
data <- <- rnbinom(200, size = 1.5, prob = 0.8)
everything works fine. Any suggestions?
A KS-Test is for continuous variables only, plus you have to fully specify the distribution you are testing against. If you still wanted to do it, it would look like this:
ks.test(data, pnbinom, size=100, prob=0.8)
It compares the empirical cumulative distribution function of data against the specified one (whether that makes sense probably depends on your data). You would have to choose parameters for size and prob based on theoretical considerations, the test is not valid if you estimate those parameters based on the data.
Your problem with goodfit() might have to do with your data, are you sure these are counts? barplot(table(data)) does not look like it's approximately following a negative binomial distribution, compare, e.g., with barplot(table(rnbinom(200, size = 1.5, prob = 0.8)))
Finally, I'm not sure if the approach of doing a null-hypothesis test after fitting is appropriate. You may want to look into quantitative fit measures beyond / based on $\chi^2$ of which there are many (RMSEA, SRMR, ...).

Resources