Related
I'm very inexperienced with R, but I'm required to use it for the statistics class I'm taking. I'm trying to make a dot plot using
library(lattice)
dotplot(Bio$SS,
main = "Plants by Number of Short Shoots",
xlab = "Number of Short Shoots",
ylab = "Number of Plants",)
However, the graph doesn't provide a count for the y-value. It looks like this instead:
As you can see, there are no y-values given to the dot plot, even though it should be listing the number of plants with each value. When I made a histogram using a similar formula it worked fine:
hist(Bio$SS,
main = "Plants by Number of Short Shoots",
xlab = "Number of Short Shoots",
ylab = "Number of Plants",
col = "green")
Here is how that turned out:
This one properly provided the count for a y-value. How can I make the dotplot do the same?
Here's the data I'm using:
structure(list(ï..Block = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), Treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("NFCT", "NFNP", "SFCT", "SFNP"), class = "factor"),
Plant = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), Stem = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L), SS = c(4L, 2L, 3L, 2L,
1L, 2L, 5L, 5L, 4L, 4L, 5L, 3L, 3L, 2L, 4L, 2L, 6L, 3L, 10L,
2L, 5L, 2L, 6L, 2L, 4L), LS = c(4L, 7L, 1L, 7L, 7L, 6L, 5L,
5L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 4L, 1L, 4L, 4L, 4L,
2L, 4L, 1L), Leaves = c(30L, 30L, 13L, 32L, 32L, 35L, 33L,
34L, 27L, 23L, 21L, 20L, 25L, 24L, 25L, 25L, 24L, 25L, 29L,
20L, 20L, 22L, 25L, 23L, 13L), Inf. = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), TLength = c(10.5, 11.2, 6.2, 12.2, 11.3,
11.5, 11.9, 11.7, 10, 11.5, 10.9, 12.2, 12.6, 12.2, 12.1,
12, 6.5, 6.7, 13, 6.2, 7.6, 5.9, 7.7, 6, 5.6)), row.names = c(NA,
25L), class = "data.frame")
I have several categorical variables and I need to plot its horizontal barplots in function of the frequency of their modalities. for example, if I want to plot horizontal barplot of the variable INTERET_ENVIRONNEMENT knowing that its modalities are:
> unique(DATABASE$INTERET_ENVIRONNEMENT)
[1] Beaucoup Un peu Pas du tout
Levels: Beaucoup Pas du tout Un peu
then using the code above :
ords <- c("Beaucoup", "Un peu", "Pas du tout")
ggplot(DATABASE, aes(x = INTERET_ENVIRONNEMENT)) +
geom_bar(fill = "orange", width = 0.7) +
scale_x_discrete(limits = ords) +
coord_flip() +
xlab("Storm Type") + ylab("Number of Observations")
I get this
Now I want to add all other categorical variables to get their horizontal bar plots in the same plot.
For example, if I want to add also the INTERET_COMPOSITION variable which has the same modalities ("Beaucoup", "Un peu", "Pas du tout").
I try using this code
ggplot(DATABASE, aes(x = INTERET_ENVIRONNEMENT)) +
geom_bar(fill = "orange", width = 0.7) +
scale_x_discrete(limits = ords) +
coord_flip() +
xlab("Storm Type") + ylab("Number of Observations")+
facet_wrap(~INTERET_COMPOSITION)
But, it doesn't give the needed results.
To make my example reproductible, this is a data set which contains 4 categorical variables having same modalities:
structure(list(INTERET_COMPOSITION = structure(c(1L, 1L, 1L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 1L,
1L, 1L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Beaucoup",
"Pas du tout", "Un peu"), class = "factor"), INTERET_ENVIRONNEMENT = structure(c(1L,
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 1L,
1L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("Beaucoup", "Pas du tout", "Un peu"), class = "factor"),
INTERET_ORIGINE_GEO = structure(c(1L, 2L, 1L, 1L, 3L, 1L,
3L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 2L, 1L, 1L, 3L, 1L, 1L,
1L, 1L, 3L, 3L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L), .Label = c("Beaucoup",
"Pas du tout", "Un peu"), class = "factor"), INTERET_ALIM_NATURELLE = structure(c(1L,
3L, 3L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 2L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("Beaucoup", "Pas du tout", "Un peu"
), class = "factor")), .Names = c("INTERET_COMPOSITION",
"INTERET_ENVIRONNEMENT", "INTERET_ORIGINE_GEO", "INTERET_ALIM_NATURELLE"
), row.names = c(1L, 2L, 3L, 5L, 9L, 13L, 14L, 16L, 18L, 19L,
20L, 24L, 27L, 29L, 30L, 32L, 33L, 35L, 36L, 37L, 39L, 44L, 49L,
51L, 52L, 53L, 55L, 56L, 61L, 62L, 63L, 65L, 66L, 67L, 71L, 74L,
75L, 80L, 81L, 84L, 86L, 90L, 92L, 95L, 96L, 99L, 100L, 103L,
104L, 107L), class = "data.frame")
>
Please, how should I do to plot their horizontal barplot in same figure?
You have to transform your data from wide to long
library(tidyverse)
d %>%
gather(k, v) %>%
ggplot(aes(v)) +
geom_bar(fill = "orange", width = 0.7) +
coord_flip() +
facet_wrap(~k)
I am running nonlinear PCA in r, using the homals package. Here is a chunk of the code I am using as an example:
res1 <- homals(data = mydata, rank = 1, ndim = 9, level = "nominal")
res1 <- rescale(res1)
I want to generate 1000 bootstrap estimates of the eigenvalues in this analysis (with replacement), but I can't figure out the code. Does anyone have any suggestions?
Sample data:
dput(head(mydata, 30))
structure(list(`W age` = c(45L, 43L, 42L, 36L, 19L, 38L, 21L,
27L, 45L, 38L, 42L, 44L, 42L, 38L, 26L, 48L, 39L, 37L, 39L, 26L,
24L, 46L, 39L, 48L, 40L, 38L, 29L, 24L, 43L, 31L), `W education` = c(1L,
2L, 3L, 3L, 4L, 2L, 3L, 2L, 1L, 1L, 1L, 4L, 2L, 3L, 2L, 1L, 2L,
2L, 2L, 3L, 3L, 4L, 4L, 4L, 2L, 4L, 4L, 4L, 1L, 3L), `H education` = c(3L,
3L, 2L, 3L, 4L, 3L, 3L, 3L, 1L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 2L,
2L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 4L), `N children` = c(10L,
7L, 9L, 8L, 0L, 6L, 1L, 3L, 8L, 2L, 4L, 1L, 1L, 2L, 0L, 7L, 6L,
8L, 5L, 1L, 0L, 1L, 1L, 5L, 8L, 1L, 0L, 0L, 8L, 2L), `W religion` = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), `W employment` = c(1L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L), `H occupation` = c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 2L, 4L, 2L, 2L,
2L, 2L, 4L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 2L, 2L, 1L), `Standard of living` =
c(4L,
4L, 3L, 2L, 3L, 2L, 2L, 4L, 2L, 3L, 3L, 4L, 3L, 3L, 1L, 4L, 4L,
3L, 1L, 1L, 1L, 4L, 4L, 4L, 3L, 4L, 4L, 2L, 4L, 4L), Media = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Contraceptive = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("W age",
"W education", "H education", "N children", "W religion", "W employment",
"H occupation", "Standard of living", "Media", "Contraceptive"
), row.names = c(NA, 30L), class = "data.frame")
>
I was given the rescale function to use with the homals package, to do optimal scaling. Here is the function:
rescale <- function(res) {
# Rescale homals results to proper scaling
n <- nrow(res$objscores)
m <- length(res$catscores)
res$objscores <- (n * m)^0.5 * res$objscores
res$scoremat <- (n * m)^0.5 * res$scoremat
res$catscores <- lapply(res$catscores, FUN = function(x) (n * m)^0.5 * x)
res$cat.centroids <- lapply(res$cat.centroids, FUN = function(x) (n * m)^0.5 * x)
res$low.rank <- lapply(res$low.rank, FUN = function(x) n^0.5 * x)
res$loadings <- lapply(res$loadings, FUN = function(x) m^0.5 * x)
res$discrim <- lapply(res$discrim, FUN = function(x) (n * m)^0.5 * x)
res$eigenvalues <- n * res$eigenvalues
return(res)
}
The standard way to bootstrap in R is to use base package boot.
I am not very satistied with the code that follows because it is throwing lots of warnings. But maybe this is due to the dataset I have tested it with. I have used the dataset and 3rd example in help("homals").
I have run 10 bootstrap replicates only.
library(homals)
library(boot)
boot_eigen <- function(data, indices){
d <- data[indices, ]
res <- homals(d, active = c(rep(TRUE, 4), FALSE), sets = list(c(1,3,4),2,5))
res$eigenvalues
}
data(galo)
set.seed(7578) # Make the results reproducible
eig <- boot(galo, boot_eigen, R = 10)
eig
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = galo, statistic = boot_eigen, R = 10)
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* 0.1874958 0.03547116 0.005511776
#t2* 0.2210821 -0.02478596 0.005741331
colMeans(eig$t)
#[1] 0.2229669 0.1962961
If this also doesn't run properly in your case, please say so and I will delete the answer.
EDIT.
In order to answer to the discussion in the comments, I have changed the function boot_eigen, the call to homals now follows the question code and rescale is called before returning.
boot_eigen <- function(data, indices){
d <- data[indices, ]
res <- homals(data = d, rank = 1, ndim = 9, level = "nominal")
res <- rescale(res)
res$eigenvalues
}
set.seed(7578) # Make the results reproducible
eig <- boot(mydata, boot_eigen, R = 10)
I'm having the following data on an experiment, where I want to find out, how an bacterium reacts on two similar levels (nucleic acids) to a treatment.
Treatment happened after the sampling on day 0 (vertical dashed line). As you can see, it got more abundant (line is average, dots are measured triplicates). I have 3 technical replicates (doing the lab work 3 times on the same sample) but no biological replicates.
For publication purposes, I want to show that the induced change is significant. So far I used a two tailed t test for heteroscedastic samples, using the 3 sample points day -25 to 0 as sample group 1 and 5 sample points day 3 to 17 as sample group 2 (this is the range where most of my bacteria reacted).
Afterwards I performed the Bonferroni correction on the p values to correct for multiple testing. But is this the correct way and is it possible with only technical replicates?
I'm finding many hints on fitting models to my graph, but I only want to test for statistic significance of difference between before and after treatment. So I'm searching for the correct statistics and also how to apply it in R. Any help appreciated!
here is the plot:
require(ggplot2)
require(scales)
ggplot(data=sample_data, aes(x=days-69,y=value,colour=nucleic_acid,group=nucleic_acid,lty=nucleic_acid))+
geom_vline(aes(xintercept=0),linetype="dashed", size=1.2)+
geom_point(aes(),colour="black")+
stat_summary(aes(colour=nucleic_acid),colour="black",fun.y="mean", geom="line", size=1.5)+
scale_linetype_manual(values=c("dna"=1,"cdna"=4),
name="Nucleic acid ",
breaks=c("cdna","dna"),
labels=c("16S rRNA","16S rDNA"))+
scale_x_continuous(breaks = scales::pretty_breaks(n = 20))+
theme_bw()+
scale_y_continuous(label= function(x) {ifelse(x==0, "0", parse(text=gsub("[+]", "", gsub("e", " %*% 10^", scientific_format()(x)))))})+
theme(axis.title.y = element_text(angle=90,vjust=0.5))+
theme(axis.text=element_text(size=12))+
theme(legend.text=element_text(size=11))+
theme(panel.grid.major=element_line(colour = NA, size = 0.2))+
theme(panel.grid.minor=element_line(colour = NA, size = 0.5))+
theme(legend.position="bottom")+
theme(legend.background = element_rect(fill="grey90",linetype="solid"))+
labs(x="Days",
y=expression(atop("Absolute abundance in cell equivalents",bgroup("[",relative~abundance~x~cells~mL^{-1},"]"))))
and here is my data:
sample_data<-structure(list(time = c(10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L,
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L,
13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L,
15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 18L,
18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L,
9L, 9L), days = c(83L, 83L, 83L, 83L, 83L, 83L, 86L, 86L, 86L,
86L, 86L, 86L, 91L, 91L, 91L, 91L, 91L, 91L, 98L, 98L, 98L, 98L,
98L, 98L, 105L, 105L, 105L, 105L, 105L, 105L, 112L, 112L, 112L,
112L, 112L, 112L, 119L, 119L, 119L, 119L, 119L, 119L, 126L, 126L,
126L, 126L, 133L, 133L, 133L, 133L, 133L, 133L, 140L, 140L, 140L,
140L, 140L, 140L, 44L, 44L, 44L, 44L, 44L, 44L, 62L, 62L, 62L,
62L, 62L, 62L, 69L, 69L, 69L, 69L, 69L, 69L, 72L, 72L, 72L, 72L,
72L, 72L, 76L, 76L, 76L, 76L, 76L, 76L, 79L, 79L, 79L, 79L, 79L,
79L), parallel = c(3L, 1L, 2L, 2L, 3L, 1L, 2L, 3L, 3L, 2L, 1L,
1L, 2L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 1L,
1L, 3L, 2L, 1L, 1L, 2L, 3L, 3L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 3L,
1L, 1L, 3L, 2L, 3L, 1L, 1L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 2L, 2L,
3L, 3L, 1L, 1L, 2L, 2L, 3L, 1L, 1L, 3L, 2L, 1L, 2L, 3L, 3L, 1L,
2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 3L, 3L, 1L, 2L, 3L,
3L, 1L, 2L), nucleic_acid = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("cdna", "dna"), class = "factor"),
habitat = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "water", class = "factor"),
value = c(5316639.62, 6402573.912, 6294710.95, 2369809.996,
2679661.691, 2105693.166, 2108794.224, 2487177.041, 6021765.438,
5524939.499, 6016021.786, 2628427.206, 3164229.113, 896068.7656,
2966515.364, 4436008.425, 1860580.149, 3911309.508, 888489.0268,
1004334.365, 1141636.992, 961140.0729, 1072009.18, 1134997.852,
668013.4333, 459645.1058, 645944.1129, 702293.6865, 590620.3693,
642136.7523, 932531.1588, 1224299.065, 1502344.5, 1545034.46,
1122002.798, 1411050.57, 1465061.711, 1378876.488, 810348.2823,
1361496.248, 1056558.288, 897876.4169, 931519.9524, 1165768.09,
957873.9045, 746011.7558, 624116.5603, 522209.2283, 551120.1371,
440096.4446, 565108.4447, 373304.8604, 266595.7171, 333767.4042,
185612.6681, 144899.8736, 173739.3969, 211490.827, 223815.0867,
296455.4243, 1278759.217, 247292.4355, 1171554.199, 1146278.577,
227443.8462, 233542.6719, 253224.2629, 875040.4892, 1151921.616,
1285744.479, 355381.9156, 110724.7928, 252238.9632, 912865.3372,
608269.6498, 500307.5301, 774955.9598, 1374106.94, 3121909.308,
1071086.757, 3033665.589, 2984567.998, 1396313.444, 1356465.773,
4480581.956, 4273141.231, 4957691.655, 1910056.657, 5520085.32,
5094686.657, 5990052.759, 2272441.566, 1513268.608, 1821716.75
), treatment2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Treatment", class = "factor")), .Names = c("time",
"days", "parallel", "nucleic_acid", "habitat", "value", "treatment2"
), class = "data.frame", row.names = c(51243L, 51244L, 51245L,
51246L, 51247L, 51248L, 51255L, 51256L, 51257L, 51258L, 51259L,
51260L, 51267L, 51268L, 51269L, 51270L, 51271L, 51272L, 51279L,
51280L, 51281L, 51282L, 51283L, 51284L, 51291L, 51292L, 51293L,
51294L, 51295L, 51296L, 51303L, 51304L, 51305L, 51306L, 51307L,
51308L, 51315L, 51316L, 51317L, 51318L, 51319L, 51320L, 51326L,
51327L, 51328L, 51329L, 51336L, 51337L, 51338L, 51339L, 51340L,
51341L, 51348L, 51349L, 51350L, 51351L, 51352L, 51353L, 51360L,
51361L, 51362L, 51363L, 51364L, 51365L, 51372L, 51373L, 51374L,
51375L, 51376L, 51377L, 51384L, 51385L, 51386L, 51387L, 51388L,
51389L, 51396L, 51397L, 51398L, 51399L, 51400L, 51401L, 51408L,
51409L, 51410L, 51411L, 51412L, 51413L, 51420L, 51421L, 51422L,
51423L, 51424L, 51425L))
If you want to test for significance of the effect of your treatment and you know how to fit model(s) on your data, you can simply fit a model which includes your treatment effect and a model which doesn't. Then compare the models by means of a likelihood ratio test.
In R it is pretty straightforward (I assume for simplicity a linear model, which anyway may not be the best choice, based on your data):
# Models fit
model_effect <- lm(y~Time + Treatment, data)
model_null <- lm(y~Time, data)
# Models comparison
anova(model_effect, model_null)
I am trying to replicate a 3 Factor nested ANOVA anlaysis in a paper: Underwood, AJ (1993) The Mechanics of spatially replicated sampling programmes to detect environmental impacts in a variable world.
The data for the example (from Table 3, Underwood 1993) can be produced by:
dat <-
structure(list(B = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("A", "B"), class = "factor"), C = structure(c(2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("C", "I"), class = "factor"),
Times = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
Locations = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L), X = c(59L, 51L, 45L, 46L, 40L, 32L, 39L, 32L, 25L, 51L,
44L, 37L, 55L, 47L, 41L, 31L, 38L, 45L, 41L, 47L, 55L, 43L,
36L, 29L, 23L, 30L, 37L, 57L, 50L, 43L, 36L, 44L, 51L, 39L,
29L, 23L, 38L, 44L, 52L, 31L, 38L, 45L, 42L, 35L, 28L, 52L,
44L, 37L, 51L, 43L, 37L, 38L, 31L, 24L, 60L, 52L, 46L, 30L,
37L, 44L, 41L, 34L, 27L, 53L, 46L, 39L, 40L, 34L, 26L, 21L,
27L, 35L), Times.unique = structure(c(5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A_1", "A_2", "A_3",
"A_4", "B_1", "B_2", "B_3", "B_4"), class = "factor")), .Names = c("B",
"C", "Times", "Locations", "Y", "Times.unique"), row.names = c(NA,
-72L), class = "data.frame")
dat
The data frame dat has 4 factors:
B - has two levels "A" and "B" (before v after)
Times - 8 levels, 4 within before "B" and 4 within after "A", coded as 1:4 within each. note that variable Times.unique is the same thing but with a unique code for each time (before and after)
Locations - has three levels, all measured every time both before and after
C - has two levels control (C) and (I). note: two locations are control and one is impact
While I am clear on how to analyse such a design using mixed models (lmer), I would like to replicate his example exactly so that I can run some simulations to compare his method.
In particular I am attempting to replicate the SS values presented in table 4 under column "a". He fits a design that has SS and df values for the following terms:
B -> SS = 66.13, df = 1
Times(B) -> SS = 280.64, df = 6
Locations -> SS = 283.86, df = 2
B x Locations -> SS = 29.26, df = 2
Times(B) x Locations-> SS = 575.45, df = 12
Residual -> SS = 2420.00, df = 48
Total -> SS = 6208.34, df = 71
I assume the Times(B) term represents Times nested within the Before/After treatment "B". For this example he ignores that Locations are from control and impact treatments and leaves out factor C altogether.
I have tried all possible combinations I can think of to reproduce this nested anova, using both unique Times coding and Times coded as 1:4 within B (before and after). I have tried using %in%, / and Error() arguments, as well as Anova from car to change the type of SS calculated. Examples of the %in% and / nested fits include:
aov(Y~B+Locations+Times%in%B+B:Locations+Times%in%B:Locations, data=dat)
aov(Y~B+Locations+B/Times+B:Locations+B/Times:Locations, data=dat)
I seem to be unable to replicate Underwood's SS values exactly, particularly for the two interaction terms. A friend let me fit the model in statistix, where the SS values can be reproduced exactly, so it is possible to obtain the above SS values for this model.
Can anyone help me fit this model in R? I wish to embed it in a larger simulation and really need to be able to run the model in R, such that the Underwood 1993 SS values are reproduced exactly?
Your problem is that dat$Locations is an integer, when it should be a factor (three unique locations). One hint is that your ANOVA line thinks Locations takes up only 1 df, while Underwood gives it 2.
Simply add the line:
dat$Locations = factor(dat$Locations)
And then your line of code reproduces the Underwood results perfectly:
aov(Y~B+Locations+B/Times+B:Locations+B/Times:Locations, data=dat)
#Call:
# aov(formula = Y ~ B + Locations + B/Times + B:Locations + B/Times:Locations,
# data = dat)
#
#Terms:
# B Locations B:Times B:Locations B:Locations:Times
#Sum of Squares 66.1250 2836.8611 280.6389 29.2500 575.4444
#Deg. of Freedom 1 2 6 2 12
# Residuals
#Sum of Squares 2420.0000
#Deg. of Freedom 48