I want to do paired t-test with a data frame. I think I grouped them right but do not know why it reports the error:
Error in complete.cases(x, y) : not all arguments have the same length.
centre_g is my data frame containing all the info I want to use in my analysis. Paired t-test is a right way to do it.
str(centre_g)
# Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':
# 24 obs. of 17 variables
# (I will only list two variables that is used for my anaysis):
# $ BA: Factor w/ 2 levels "after","before": 2 1 2 1 2 1 2 1 2 1 ...
# $ Pb: num 437 1183 1465 3105 NA ...
I used to extract "before" and "after" for "Pb", i.e. I extracted two vectors in the data frame, and did paired t-test, it works fine
(tResult <- t.test(before$Pb, after$Pb, paired = TRUE))
but when I tried to do the paired t-test directly on my data frame, it has the error message mentioned in the question
(tResult <- t.test(Pb ~ BA, data = centre_g, paired = TRUE))
I tried several times, with grouped data or sorted data. I do not know what is wrong with the second method. Is it because the NA values I have got in my data frame? but the first method is fine?
Since I have quite a lot more information in my data frame waiting to be analysed, I do not want to extract vectors for every single of them. I hope to do my paired t-test on my data frame. Could anyone help me?
the detail of centre_g is:
structure(list(day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), SAMPLE.No = structure(c(1L,
13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 3L, 5L, 7L, 9L, 11L,
1L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 3L), .Label = c("s1",
"s1.2", "s10", "s10.2", "s11", "s11.2", "s12", "s12.2", "s13",
"s13.2", "s14", "s14.2", "s2", "s2.2", "s3", "s3.2", "s4", "s4.2",
"s5", "s5.2", "s6", "s6.2", "s7", "s7.2", "s8", "s8.2", "s9",
"s9.2"), class = "factor"), weir = c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 12L, 12L), BA = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L), .Label = c("after", "before"), class = "factor"), centre.bank = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("bank", "centre"), class = "factor"),
Pb = c(436.65, 1182.93, 1465.21, 3105.36, 39.1, 1493.91,
NA, 165.28, 38.83, 351.48, 80.26, 47.39, 151.27, 434.01,
-97.58, 240.83, 56.8, 40.24, 38.8, NA, 41.13, 38.93, 44.39,
39.05), Pb.Error = c(16.41, 30.01, 51.26, 102.44, 27.21,
79.63, NA, 13.82, 48.78, 16.71, 19.1, 21.43, 18.65, 21.41,
232.7, 18.83, 12.19, 15.28, 11.94, NA, 22.24, 14.01, 10.56,
9.63), Zn = c(542.52, 981.83, 1234.78, 7554.41, 529.38, 5240.01,
NA, 542.65, 526.08, 820.87, 649.7, 793.42, 707.23, 1204.3,
-34.56, 209.86, 172.5, 130.29, 187.96, NA, 234.57, 137.38,
165.21, 135.05), Zn.Error = c(19.5, 29.31, 48.12, 161.54,
42.36, 144.56, NA, 23.37, 52.5, 26.18, 33.33, 39.87, 31.89,
35.79, 44.83, 17.24, 15.11, 21.25, 19.76, NA, 26.65, 18.67,
15.12, 13.97), Fe = c(3731.23, 14239.54, 23774.52, 52349.37,
3896.63, 13311.26, NA, 2756.96, 3511.06, 2664.12, 2383.16,
2785.75, 2834.59, 6288.39, -321.14, 14704.05, 3825.8, 5017.52,
13181.67, NA, 31190.39, 8516.23, 14130, 18348.01), Fe.Error = c(106.82,
229.87, 432.59, 884.29, 239.03, 496.1, NA, 111.92, 283.9,
102.44, 137.69, 161.02, 137.66, 172.32, 187.37, 274.6, 140.64,
240.97, 310.62, NA, 565.41, 265.57, 260.75, 291.45), Mn = c(110.65,
1337.08, 1126.82, 3495.03, 410.99, 5267.34, NA, 314.42, 338.8,
591.99, 308.46, 427.59, 573.87, 896.23, 277.82, 421.17, 969.72,
535.07, 879.97, NA, 742.39, 350.62, 379.98, 834.36), Mn.Error = c(43.39,
93.86, 133.34, 297.53, 125.08, 410.14, NA, 63.25, 155.08,
68.16, 82.1, 96.34, 88.97, 89.89, 1470.88, 78, 92.24, 118.6,
112.32, NA, 134.87, 91.97, 72.7, 91.12), Cr = c(-38.15, 50.8,
25.9, 53.32, 21.52, 132.82, NA, 8.13, 5.46, 35.07, 93.78,
88.18, 71.23, 47.26, 32.91, 25.49, 10.36, 19.99, 5.13, NA,
32.61, 22.13, 47.5, -5.82), Cr.Error = c(9.05, 16.41, 7.7,
9.99, 4.58, 33.88, NA, 7.84, 2.86, 9.18, 8.75, 7.55, 7.98,
9.62, 6.38, 5.54, 6.72, 4.6, 6.5, NA, 6.64, 4.62, 9.51, 11.3
), Ca = c(32195.21, 46510.98, 21723.24, 17820.74, 14639.01,
45937.9, NA, 37840.08, 4704.64, 37705.36, 28625.21, 25115.24,
41579.19, 91829.16, 19752.96, 14605.4, 34654.73, 15798.87,
13873.07, NA, 22901.14, 4097.09, 12053.38, 276525.69), Ca.Error = c(211.2,
326.69, 160.54, 142.76, 120.63, 304.76, NA, 219.4, 66.28,
225.41, 187.03, 169.88, 226.15, 378.53, 149.92, 125.47, 208.18,
127.73, 127.4, NA, 168.31, 64.51, 128.02, 908.61)), row.names = c(1L,
4L, 6L, 8L, 10L, 12L, 13L, 16L, 17L, 19L, 21L, 23L, 26L, 28L,
29L, 32L, 34L, 36L, 38L, 39L, 42L, 43L, 46L, 48L), class = "data.frame")
I am interested in doing paired t test on "Pb" column, trying to compare "before" and "after" (as shown in column "BA"). Each "weir" would be an individual.
I have worked it our after a day. I found it is because a row of NA data. There are some places where I did not manage to take samples, so there appears to be a whole row of NA data (except the factors columns).
To make sure the data frame has the whole length (24 instead of 23) and does not omit NA data, add na.rm = FALSE when subsetting the data frame into centre_g.
centre_g <- subset(HM_selected, centre.bank == "centre", na.rm = FALSE)
(I think I gave the right centre_g in my question dataset, but occationally I just got 23 data. adding na.rm to make sure how NA data are processed)
When doing the paired t-test, also add na.rm = FALSE.
(tRESULT <- t.test(Pb ~ BA, data = centre, paired = TRUE, na.rm = FALSE)
and that works perfectly for me.
sorry if there is any confusion in the question
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
Hi suppose I have these results
df <- structure(list(len = c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2,
5.2, 7, 15.2, 21.5, 17.6, 9.7, 14.5, 10, 8.2, 9.4, 16.5, 9.7,
16.5, 16.5, 15.2, 17.3, 22.5, 17.3, 13.6, 14.5, 18.8, 15.5, 19.7,
23.3, 23.6, 26.4, 20, 25.2, 25.8, 21.2, 14.5, 27.3, 23.6, 18.5,
33.9, 25.5, 26.4, 32.5, 26.7, 21.5, 23.3, 29.5, 25.5, 26.4, 22.4,
24.5, 24.8, 30.9, 26.4, 27.3, 29.4, 23), supp = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("OJ",
"VC"), class = "factor"), dose = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("D0.5", "D1", "D2"
), class = "factor")), row.names = c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L,
40L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 41L, 42L,
43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 21L, 22L, 23L, 24L, 25L,
26L, 27L, 28L, 29L, 30L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L,
59L, 60L), class = "data.frame")
df$int <- interaction(df$supp, df$dose)
e <- pairwise.t.test(df$len, df$int, p.adjust.method="BH")
so from the output
OJ.D0.5 VC.D0.5 OJ.D1 VC.D1 OJ.D2
VC.D0.5 0.00285 - - - -
OJ.D1 0.00000079391014 0.00000000000984 - - -
VC.D1 0.04207 0.00000243821908 **0.00088** - -
OJ.D2 0.00000000042891 0.00000000000001 0.04645 0.00000089414918 -
VC.D2 0.00000000042891 0.00000000000001 0.04474 0.00000085310153 0.96089
the comparison of, VC.D1 vs OJ.D1 = 0.00088
however a single t.test
t.test(df[df$supp == "VC" & df$dose == "D1", ]$len,
df[df$supp == "OJ" & df$dose == "D1", ]$len)
yields a p.value = p-value = 0.001038
so I most have messed up somewhere because shouldn't an adjusted p value be greater than a single uncorrected p value?
Solution
You'll get the same results when you set p.adjust.method = "none" and pool.sd = FALSE:
pairwise.t.test(df$len, df$int, p.adjust.method = "none", pool.sd = FALSE)$p.value[3,3]
# 0.001038376
t.test(df[df$supp == "VC" & df$dose == "D1", ]$len,
df[df$supp == "OJ" & df$dose == "D1", ]$len)$p.value
# 0.001038376
Notes
Just a reminder to always carefully read documentation and perform some sanity checks, to make sure the function does what you think it does.
This only illustrates where the difference comes from. How to run it in your case will have to depend on your familiarity with the data.
Explanation
The comparison becomes much easier when we simply don't apply multiple testing correction. In that case, they should have the same p-value, right? So let's compare using p.adjust.method = "none". When running pairwise.t.test we now get 0.00059... closer, but still not right.
The problem stems from the pool.sd argument. This forces the use of a common standard deviation across all comparisons. This is useful in general (if the assumption is met), but does lead to different p-values.
When we look at the underlying code, this becomes clear:
if (pool.sd) {
METHOD <- "t tests with pooled SD"
xbar <- tapply(x, g, mean, na.rm = TRUE)
s <- tapply(x, g, sd, na.rm = TRUE)
n <- tapply(!is.na(x), g, sum)
degf <- n - 1
total.degf <- sum(degf)
pooled.sd <- sqrt(sum(s^2 * degf)/total.degf)
compare.levels <- function(i, j) {
dif <- xbar[i] - xbar[j]
se.dif <- pooled.sd * sqrt(1/n[i] + 1/n[j])
t.val <- dif/se.dif
if (alternative == "two.sided")
2 * pt(-abs(t.val), total.degf)
else pt(t.val, total.degf, lower.tail = (alternative ==
"less"))
}
}
Amongst others, a total degrees of freedom is calculated across the tests (total.degf) which is then used to calculate a pooled standard deviation (pooled.sd).
when we set pool.sd = FALSE, the code simply uses the t.test function:
else {
METHOD <- if (paired)
"paired t tests"
else "t tests with non-pooled SD"
compare.levels <- function(i, j) {
xi <- x[as.integer(g) == i]
xj <- x[as.integer(g) == j]
t.test(xi, xj, paired = paired, alternative = alternative,
...)$p.value
}
}
Is there a way to make a file with the correlation statistic between the raw number of fish observed ("num") and each environmental data column ("temp", "do", etc.) by species ("group")?
*As well as correlations between the means and medians of num vs. env. factors?
I'd also like to be able to choose which correlation method to use (Pearson correlation, Kendall rank correlation, Spearman correlation, etc.)
My data:
zeros <- structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("2019", "2020"), class = "factor"), season = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("dry", "wet"), class = "factor"),
site = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L,
1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 1L, 1L, 2L, 2L, 3L,
3L, 4L, 4L, 5L, 5L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
group = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L
), .Label = c("Hardhead silverside", "Sailfin molly"), class = "factor"),
num = c(0, 8, 0, 9, 0, 13, 0, 9, 0, 10, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 7, 0, 2,
0, 3, 0, 13, 0), temp = c(23L, 36L, 35L, 34L, 30L, 28L, 18L,
19L, 33L, 33L, 25L, 20L, 33L, 23L, 36L, 32L, 28L, 17L, 34L,
31L, 26L, 34L, 26L, 35L, 15L, 25L, 26L, 20L, 18L, 14L, 23L,
17L, 26L, 17L, 17L, 19L, 29L, 31L, 18L, 15L), sal = c(12.5,
25.5, 8.5, 15.5, 17.5, 27.5, 9.5, 31.5, 1.5, 34.5, 25.5,
21.5, 10.5, 8.5, 32.5, 19.5, 6.5, 5.5, 15.5, 28.5, 6.5, 3.5,
29.5, 13.5, 7.5, 16.5, 3.5, 28.5, 22.5, 5.5, 9.5, 12.5, 29.5,
24.5, 8.5, 32.5, 37.5, 3.5, 12.5, 19.5), do = c(9.66, 7.66,
1.66, 14.66, 15.66, 1.66, 14.66, 15.66, 0.66, 5.66, 10.66,
11.66, 4.66, 0.66, 13.66, 1.66, 13.66, 6.66, 6.66, 10.66,
9.66, 15.66, 9.66, 15.66, 4.66, 13.66, 1.66, 11.66, 6.66,
8.66, 12.66, 0.66, 6.66, 0.66, 9.66, 16.66, 1.66, 10.66,
15.66, 10.66), depth = c(120L, 161L, 52L, 52L, 43L, 105L,
165L, 23L, 79L, 136L, 41L, 59L, 65L, 118L, 122L, 69L, 137L,
88L, 152L, 105L, 108L, 79L, 96L, 80L, 22L, 110L, 157L, 118L,
126L, 93L, 156L, 64L, 74L, 24L, 111L, 113L, 157L, 78L, 121L,
130L)), class = "data.frame", row.names = c(NA, -40L))
The first part of your question is straightforward:
zeros.spl <- split(zeros, zeros$group)
zeros.cors <- sapply(zeros.spl, function(x) cor(x[, "num"], x[, 6:9]))
dimnames(zeros.cors)[[1]] <- colnames(zeros)[6:9]
zeros.cors
# Hardhead silverside Sailfin molly
# temp -0.3080334 0.36174046
# sal 0.1393580 0.47095129
# do 0.2544695 -0.06646818
# depth 0.1296208 0.08777425
t(zeros.cors)
# temp sal do depth
# Hardhead silverside -0.3080334 0.1393580 0.25446948 0.12962078
# Sailfin molly 0.3617405 0.4709513 -0.06646818 0.08777425
Use write.csv(zeros.cors, file="results.csv") or write.csv(t(zeros.cors), file="results.csv") depending on what you want the rows/cols to be.
The second question is not clear. The means/medians of a group will be a single value so you cannot correlate it with the environmental variables. You could compute the means by group with aggregate:
aggregate(zeros[, 5:9], by=list(zeros$group), "mean")
# Group.1 num temp sal do depth
# 1 Hardhead silverside 1.45 25.95 15.35 8.51 105.20
# 2 Sailfin molly 2.45 25.00 18.90 9.06 90.25
aggregate(zeros[, 5:9], by=list(zeros$group), "median")
# Group.1 num temp sal do depth
# 1 Hardhead silverside 0 26 11.5 9.66 115.5
# 2 Sailfin molly 0 24 19.5 10.66 90.5
Using the code from https://www.r-graph-gallery.com/84-tukey-test.html , I have been trying to add letters to my boxplot, but the aov function will not work for my model because it is under lmer instead of lm.
When I use the anova function instead of the aov, the rest of the code will not work. Is there any substitution I can make there that would work?
The cld (compact letter display) from package multcomp can do this for different kinds of models.
library("lme4")
library("multcomp")
data <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("G1", "G2", "G3"), class = "factor"),
Subject = structure(c(1L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("S1",
"S10", "S11", "S12", "S13", "S14", "S15", "S2", "S3", "S4",
"S5", "S6", "S7", "S8", "S9"), class = "factor"), Value = c(9.83,
13.62, 13.2, 14.69, 9.27, 11.68, 14.65, 12.21, 11.58, 13.58,
12.49, 10.28, 12.22, 12.58, 15.43, 9.47, 11.47, 10.79, 10.66,
10.87, 12.98, 12.85, 8.67, 10.45, 13.62, 13.64, 12.46, 8.66,
10.66, 13.18, 11.97, 13.56, 11.83, 14.02, 11.38, 14.15, 13.22,
9.14, 11.66, 14.2, 14.18, 11.26, 11.98, 13.77, 11.57)),
row.names = c(NA, -45L), class = "data.frame")
model <- lmer (Value~Group + (1|Subject), data = data)
tuk <- glht(model, linfct = mcp(Group = "Tukey"))
tuk.cld <- cld(tuk)
plot(tuk.cld)
The example was adapted from: https://stats.stackexchange.com/questions/237512/how-to-perform-post-hoc-test-on-lmer-model
For additional things regarding mixed models, please consider https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
I am struggling to get my x-axis tick labels to show up as the day the sample was taken. I am also struggling with my grouping reordered, currently, it is showing up as Afternoon coming before Pre-Dawn, I would like Pre-Dawn to be first in order.
Data
http://www.sharecsv.com/s/f7079be36f5fc5035029ae105f96d560/VR_Sonde_Data_May_2017%20(1).csv
DO=read.csv("VR_Sonde_Data_May_2017 (1).csv")
DOmelt <- melt(DO, id.vars=c("Month", "Day", "TimeofDay"), measure.vars = c("AverageDO"))
ggplot(DOmelt, aes((x=Day), group=interaction(Month, TimeofDay), fill=TimeofDay)) +
geom_bar(aes(y=value), stat="identity", position=position_dodge()) +
facet_grid(~Month, scales = "free_x") +
ggtitle("Dissolved Oxygen in Ventura River") +
labs(subtitle = "2017") +
theme(plot.title = element_text(size=30, face="bold", vjust=2, hjust=.5), plot.subtitle = element_text(size=20, face="bold", vjust=2, hjust=.5))+
scale_x_discrete("day") +
scale_y_continuous(name ="Average Dissolved Oxygen")+
theme(axis.text.x =element_text(angle=90))
You can use the following code
library(tidyverse)
DOmelt %>%
arrange(AverageDO) %>%
mutate(TimeofDay = factor(TimeofDay, levels=c("Pre-Dawn", "Afternoon"))) %>%
ggplot(aes(x=Day, y=AverageDO, group=interaction(Month, TimeofDay), fill=TimeofDay)) +
geom_bar(position=position_dodge(), stat="identity") +
facet_grid(~Month, scales = "free_x") +
ggtitle("Dissolved Oxygen in Ventura River") +
labs(subtitle = "2017") +
theme(plot.title = element_text(size=30, face="bold", vjust=2, hjust=.5), plot.subtitle = element_text(size=20, face="bold", vjust=2, hjust=.5))+
xlab("Day") +
scale_y_continuous(name ="Average Dissolved Oxygen")+
theme(axis.text.x =element_text(angle=90))
Data
DOmelt = structure(list(Month = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("May", "September"), class = "factor"),
Day = c(11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L,
16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 20L, 20L, 21L, 21L,
22L, 22L, 23L, 23L, 24L, 24L, 25L, 6L, 6L, 7L, 7L, 8L, 8L,
9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L,
15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 20L), TimeofDay = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Afternoon",
"Pre-Dawn"), class = "factor"), AverageDO = c(6.99, 12.24,
6.61, 12.05, 6.51, 11.94, 6.63, 12.12, 6.67, 12.28, 6.68,
12.14, 6.87, 11.94, 6.64, 10.77, 6.47, 9.3, 6.21, 10.71,
5.92, 10.95, 5.85, 11.46, 5.98, 11.31, 6.12, 10.27, 6.38,
6.61, 8.97, 6.88, 9.08, 7.01, 9.18, 7.2, 9.39, 7.25, 9.61,
6.97, 8.87, 6.77, 8.8, 6.88, 8.92, 7.1, 9.25, 7.34, 9.26,
7.44, 9.46, 7.59, 9.66, 7.74, 9.72, 7.77, 9.54, 7.71)), class = "data.frame", row.names = c(NA,
-58L))
I am working with a dataset in which I need to compare ordinal data to continuous data in a different column. i.e, individals were categorized (by age, actually) and I need to compare different age ranges to two different test values. I have been attempting to run a multifactor anova, and have had no luck.
First, I subset each age category and tried this:
aov.first.molar<-aov(carbon.combo~first.m.cat.1+first.m.cat.2+first.m.cat.3+first.m.cat.4+first.m.cat.5)
Error in model.frame.default(formula = carbon.combo ~ first.m.cat.1 + :
invalid type (list) for variable 'first.m.cat.1'
So the subsets didn't work, so I tried just using the column headers, just to see if it would magically organize by category...
> aov.albania.first<-aov(albania$AgeCat_first~albania$juv_deltaC_dentine+albania$Adult_deltaC_collagen)
Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
> summary(aov.albania.first)
Error in levels(x)[x] : only 0's may be mixed with negative subscripts
That obviously didn't work either, and I am not sure what I am doing wrong. I set everything as a factor, and I don't understand why the code is not working.
I am wondering if it has something to do with the fact that the nature of my test data is negative. I am not sure how to fix that without altering the data
Here is my data, as requested. I am sorry it's so messy, I am not sure how to format it better. Turning it into a matrix helped, but I am still having problems with anov and ggplot not being able to find certain things that I already turned into factors...
structure(list(Number = structure(1:10, .Label = c("142-c-1",
"142-c-3", "142-c-5", "156-c-1", "156-c-4", "156-c-6", "157-c-1",
"157-c-3", "157-c-5", "157-c-6", "158-c-3", "158-c-6", "178-c-1/A",
"178-c-2/A", "178-c-2/b", "178-c-3/b", "178-c-4/b", "186-c-2/a",
"186-c-2/b", "186-c-3/b", "186-c-4/b", "186-c-5/b", "186-c-6/b",
"192-c-1", "192-c-2", "192-c-3", "192-c-4", "192-c-5", "205-c-1",
"205-c-2", "205-c-3", "205-c-4", "205-c-5", "205-c-6", "210-c-1",
"210-c-2", "210-c-3", "210-c-4", "210-c-5", "215-c-1", "215-c-2",
"215-c-3", "215-c-4", "215-c-5", "215-c-6", "215-c-7", "270-c-1",
"270-c-2", "270-c-3", "270-c-4", "270-c-5", "295-c-1", "295-c-3",
"295-c-4", "353-c-2", "353-c-3", "353-c-4", "353-c-5", "353-c-6",
"382-c-1", "390-c-1", "390-c-2", "390-c-3"), class = "factor"),
ToothID = structure(c(3L, 3L, 3L, 8L, 8L, 8L, 7L, 7L, 7L,
7L), .Label = c("LI2", "LM1", "LM1-2", "LM3", "LP1-2", "M2",
"RM1-2", "RM2"), class = "factor"), sex = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"),
Al.Qahtani.category = structure(c(2L, 5L, 8L, 2L, 5L, 8L,
2L, 6L, 7L, 8L), .Label = c("AC", "CR 1/2", "CR 3/4", "CRC",
"R 1/2", "R 1/4", "R 3/4", "RC", "Ri ", "unk"), class = "factor"),
AgeCat_first = structure(c(1L, 2L, 3L, 2L, 3L, 4L, 1L, 2L,
2L, 3L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
AgeCat_second = c(2L, 3L, 4L, 2L, 3L, 4L, 2L, 3L, 4L, 4L),
sample_age_first = structure(c(9L, 18L, 23L, 17L, 27L, 6L,
10L, 13L, 21L, 23L), .Label = c("10.5 to 16.5", "11.5 to 14.5",
"11.5 to 15.5", "11.5 to 18.5", "11.5 to 19.5", "12.5 to 15.5",
"12.5 to 19.5", "15.5 to 20.5", "1.5 to 2.5", "1.5 to 3.5",
"17.5 to 22.5", "2.5 to 4.5", "3.5 to 6.5", "3.5 to 7.5",
" 4.5 to 6.5 ", "4.5 to 6.5", "4.5 to 7.5", "4.5 to 8.5",
"6.5 to 11.5", "6.5 to 8.5", "6.5 to 9.5", "7.5 to 10.5",
"8.5 to 10.5", "8.5 to 11.5", "8.5 to 12.5", "9.5 to 12.5",
"9.5 to 13.5", "9.5 to 15.5", "unk"), class = "factor"),
sample_age_second = structure(c(16L, 25L, 7L, 15L, 26L, 7L,
15L, 22L, 2L, 7L), .Label = c("10.5 to 16.5", "11.5 to 13.5",
"11.5 to 14.5", "11.5 to 15.5", "11.5 to 18.5", "11.5 to 19.5",
"12.5 to 15.5", "12.5 to 19.5", "14.5 to 17.5", "15.5 to 20.5",
"1.5 to 3.5", "17.5 to 22.5", "3.5 to 6.5", "4.5 to 6.5",
"4.5 to 7.5", "4.5 to7.5", " 5.5 to 6.5 ", "6.5 to 11.5",
"6.5 to 8.5", "6.5 to 9.5", "7.5 to 11.5", "7.5 to 12.5",
"8.5 to 12.5", "9.5 to 12.5", "9.5 to12.5", "9.5 to 13.5",
"9.5 to 15.5", "unk"), class = "factor"), AgeCat_adult = c(9L,
9L, 9L, 8L, 8L, 8L, 7L, 7L, 7L, 7L), age_at_death = structure(c(3L,
3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("18-30",
"31-45", ">45", "Adolescent", "Ind"), class = "factor"),
weight_percent_.N = c(11.5, 6.6, 6.8, 7.8, 8.7, 9.4, 5.6,
5.6, 9.1, 3.9), weight_percent_C = c(37.8, 26.2, 29.5, 32.7,
34.7, 34.4, 22, 30.7, 46.8, 22.7), juv_deltaN_dentine = c(4.54,
4.45, NA, 4.03, 5.73, 6.81, 5.03, 4.58, 0.3, NA), juv_deltaC_dentine = c(-22.042,
-22.865, -24.345, -23.557, -23.24, -22.282, -22.85, -22.697,
-25.439, -25.776), juv_proxy = c(7.958, 7.135, 5.655, 6.443,
6.76, 7.718, 7.15, 7.303, 4.561, 4.224), Adult_deltaC_collagen = c(-18.62,
-18.62, -18.62, -18.9, -18.9, -18.9, -18.64, -18.64, -18.64,
-18.64), adult_proxy = c(11.38, 11.38, 11.38, 11.1, 11.1,
11.1, 11.36, 11.36, 11.36, 11.36), Adult_deltaC_apatite = c(12.29,
12.29, 12.29, -10.23, -10.23, -10.23, -10.73, -10.73, -10.73,
-10.73), Adult_deltaN = c(-18.62, -18.62, -18.62, -18.9,
-18.9, -18.9, -18.64, -18.64, -18.64, -18.64), apatite_collagen_spacing = c(8.66,
8.66, 8.66, 7.67, 7.67, 7.67, 7.74, 7.74, 7.74, 7.74), Adult_percent_C = structure(c(2L,
2L, 2L, 6L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("14.31%",
"22.35%", "33.96%", "34.58%", "36.60%", "39.07%", "39.51%",
"42.12%", "42.17%", "42.29%", "42.81%", "44.01%", "44.72%",
"45.52%"), class = "factor"), Adult_percent_N = structure(c(14L,
14L, 14L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("12.16%",
"12.30%", "13.04%", "13.78%", "14.20%", "14.89%", "14.97%",
"15.13%", "15.18%", "15.66%", "15.85%", "16.10%", "4.60%",
"7.98%"), class = "factor"), Adult_CN_ratio = c(3.27, 3.27,
3.27, 3.31, 3.31, 3.31, 3.25, 3.25, 3.25, 3.25), delta_18O = c(-5.5,
-5.5, -5.5, -4.79, -4.79, -4.79, -5.39, -5.39, -5.39, -5.39
), CP = c(0.17, 0.17, 0.17, 0.21, 0.21, 0.21, 0.2, 0.2, 0.2,
0.2), IR_SF = c(3.33, 3.33, 3.33, 3.12, 3.12, 3.12, 3.19,
3.19, 3.19, 3.19), adult_bone_sampled = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("femur", "humerus",
"occipital", "temporal", "tibia"), class = "factor")), .Names = c("Number",
"ToothID", "sex", "Al.Qahtani.category", "AgeCat_first", "AgeCat_second",
"sample_age_first", "sample_age_second", "AgeCat_adult", "age_at_death",
"weight_percent_.N", "weight_percent_C", "juv_deltaN_dentine",
"juv_deltaC_dentine", "juv_proxy", "Adult_deltaC_collagen", "adult_proxy",
"Adult_deltaC_apatite", "Adult_deltaN", "apatite_collagen_spacing",
"Adult_percent_C", "Adult_percent_N", "Adult_CN_ratio", "delta_18O",
"CP", "IR_SF", "adult_bone_sampled"), row.names = c(NA, 10L), class = "data.frame")
Your data corresponds to the second question, and so does this answer.
The way the aov function works is by measuring response as dependent on the categories. The formula thus needs to be designed as variable ~ factor.
aov.albania.first <- aov(juv_deltaC_dentine + Adult_deltaC_collagen ~ AgeCat_first,
data = albania)
summary(aov.albania.first)
Df Sum Sq Mean Sq F value Pr(>F)
AgeCat_first 3 6.480 2.160 1.667 0.272
Residuals 6 7.773 1.296
The problem with the first question might be similar to this. Further, check str(first.m.cat.1) and reformat the variable to vector.