Cannot get Repeated measures ANOVA in RStudio to work - r

I am trying to work out on how to conduct a repeated measures ANOVA. My data is structured as followed
means <- structure(list(col = c("c", "v1", "b1", "v2", "b2"),
`1` = c(8.55,9.73, 8.93, 9.52, 9.91),
`2` = c(8.4, 9.97, 9.08, 9.66, 9.97),
`3` = c(8.48, 10.04, 9.13, 9.73, 10.04),
`4` = c(8.42, 9.63,8.9, 9.34, 9.82),
`5` = c(8.42, 9.59, 8.87, 9.39, 9.69),
`6` = c(8.52, 9.74, 9.02, 9.58, 9.84),
`7` = c(8.37, 9.67,8.98, 9.47, 9.74),
`8` = c(8.42, 9.67, 9.02, 9.52, 9.77),
`9` = c(8.56, 9.79, 9.36, 9.6, 9.78),
`10` = c(8.44, 9.63,9.15, 9.52, 9.67),
`11` = c(8.3, 9.58, 9.05, 9.49, 9.63),
`12` = c(8.03, 9.33, 8.82, 9.23, 9.38),
`13` = c(7.95, 9.08, 8.7, 9.04, 9.19),
`14` = c(8, 8.34, 8.37, 8.43, 8.54),
`15` = c(8.04,8.26, 8.4, 8.45, 8.61),
`16` = c(8.08, 8.09, 8.18, 8.16,8.28),
`17` = c(7.99, 8.06, 8.09, 8.15, 8.26),
`18` = c(8.06, 8.06, 8.09, 8.1, 8.22),
`19` = c(7.96, 7.96, 7.99, 8.03, 8.1),
`20` = c(7.96, 7.98, 7.99, 7.99, 8.11),
`21` = c(8.16, 8.22, 8.22, 8.26, 8.33),
`22` = c(8.08, 8.16, 8.13, 8.2, 8.2),
`23` = c(7.94, 7.97, 7.94, 7.98, 8.07),
`24` = c(8.02,8.03, 8, 8.08, 8.1),
`25` = c(8.03, 8.08, 8.09, 8.12, 8.15),
`26` = c(7.92, 7.95, 7.95, 7.96, 7.98)),
.Names = c("col","1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23","24", "25", "26"),
class = c("data.table", "data.frame"))
where "col" represent different substrates (treatments) and the numbers in the header are measurements over time. This is only part of the data.
To conduct the repeated measurements ANOVA (which is hopefully the right statistical test), I tried to follow several examples I found in the net, e.g. http://rtutorialseries.blogspot.de/2011/02/r-tutorial-series-one-way-repeated.html
# step 1 Define the levels:
levels <- c(1:26)
# define factor
factor <- as.factor(levels)
#define the frame
frame <- data.frame(factor)
# bind the colums
bind <- cbind (means$`1`,means$`2`,means$`3`,means$`4`,means$`5`,means$`6`,means$`7`,means$`8`,means$`9`,means$`10`,means$`11`,means$`12`,means$`13`,means$`14`,means$`15`,means$`16`,means$`17`,means$`18`,means$`19`,means$`20`,means$`21`,means$`22`,means$`23`,means$`24`,means$`25`,means$`26`)
# define the model
model <- lm(ph_bind ~ 1)
# ANOVA
analysis <- Anova(model, idata=frame, idesign= ~factor)
This results in:
> analysis <- Anova(model, idata = factor, idesign = ~factor)
Warning message:
In Anova.lm(model, idata = factor, idesign = ~factor) :
the model contains only an intercept: Type III test substituted
> summary (analysis)
Sum Sq Df F value Pr(>F)
Min. : 61.59 Min. : 1 Min. :20519 Min. :0
1st Qu.:2495.43 1st Qu.: 33 1st Qu.:20519 1st Qu.:0
Median :4929.27 Median : 65 Median :20519 Median :0
Mean :4929.27 Mean : 65 Mean :20519 Mean :0
3rd Qu.:7363.10 3rd Qu.: 97 3rd Qu.:20519 3rd Qu.:0
Max. :9796.94 Max. :129 Max. :20519 Max. :0
NA's :1 NA's :1
This is not the expected output I was hoping for. What am I doing wrong?
Grateful for any help:)

Related

Error running two way mixed model ANOVA in loop but not when column of data specifically identified

This is the loop that is generating the error.
Error:
! Can't subset columns with df_numeric[[column]].
✖ Can't convert from df_numeric[[column]] to due to loss of precision.
df_numeric <- df[, sapply(df, is.numeric)]
for (column in names(df_numeric)) {
res.aov <- anova_test(data = df, dv= df_numeric[[column]], wid = `Subject`, within = `Timepoint`, between = `Genotype`)
get_anova_table(res.aov)
}
But when I pull out the code for the anova and specifically input the column from my dataframe it generates the proper anova table results.
res.aov <- anova_test(data = df, dv= `Tregs CD127lo CD25+`, wid = `Subject`, within = `Timepoint`, between = `Genotype`)
get_anova_table(res.aov)
I have tried using df_numeric$column.
Dataframe
library(rstatix)
dput(df_numeric)
structure(list(`Tregs CD127lo CD25+` = c(2702, 2175, 2651, 1672.8,
3762, 4264, 1975, 3208, 3285, 3457, 3383, 2619.9, 11872, 16101,
13443, 3935, 1894, 2297, 7385, 8901, 9522, 7100, 8789, 9309,
371, 379, 514), `Monocytes % of Live by Size` = c(1.38, 2.66,
4.74, 5.83, 3.9, 5.06, 6.36, 3.45, 2.64, 6.33, 10.7, 9.41, 3.42,
3.46, 2.73, 2.38, 3.12, 4.44, 5.31, 3.59, 4.91, 1.53, 6.54, 4.85,
6.87, 3.66, 5.07), `NK cells` = c(90.62, 153.6, 159.8, 88, 118,
159, 74, 82, 64, 30, 344, 73, 29, 198, 79, 145, 258, 307, 30,
74.4, 0, 47.3, 32, 0, 52.6, 95.3, 51.7)), row.names = c(NA, -27L
), class = c("tbl_df", "tbl", "data.frame"))
> dput(df)
structure(list(Subject = c("ASCVD002", "ASCVD002", "ASCVD002",
"ASCVD003", "ASCVD003", "ASCVD003", "ASCVD004", "ASCVD004", "ASCVD004",
"ASCVD005", "ASCVD005", "ASCVD005", "ASCVD006", "ASCVD006", "ASCVD006",
"ASCVD008", "ASCVD008", "ASCVD008", "ASCVD009", "ASCVD009", "ASCVD009",
"ASCVD010", "ASCVD010", "ASCVD010", "ASCVD011", "ASCVD011", "ASCVD011"
), Timepoint = c("0", "0.25", "0.5", "0", "0.25", "0.5", "0",
"0.25", "0.5", "0", "0.25", "0.5", "0", "0.25", "0.5", "0", "0.25",
"0.5", "0", "0.25", "0.5", "0", "0.25", "0.5", "0", "0.25", "0.5"
), Genotype = c("Heterozygote", "Heterozygote", "Heterozygote",
"Heterozygote", "Heterozygote", "Heterozygote", "Heterozygote",
"Heterozygote", "Heterozygote", "GG", "GG", "GG", "AA", "AA",
"AA", "GG", "GG", "GG", "AA", "AA", "AA", "AA", "AA", "AA", "GG",
"GG", "GG"), `Tregs CD127lo CD25+` = c(2702, 2175, 2651, 1672.8,
3762, 4264, 1975, 3208, 3285, 3457, 3383, 2619.9, 11872, 16101,
13443, 3935, 1894, 2297, 7385, 8901, 9522, 7100, 8789, 9309,
371, 379, 514), `Monocytes % of Live by Size` = c(1.38, 2.66,
4.74, 5.83, 3.9, 5.06, 6.36, 3.45, 2.64, 6.33, 10.7, 9.41, 3.42,
3.46, 2.73, 2.38, 3.12, 4.44, 5.31, 3.59, 4.91, 1.53, 6.54, 4.85,
6.87, 3.66, 5.07), `NK cells` = c(90.62, 153.6, 159.8, 88, 118,
159, 74, 82, 64, 30, 344, 73, 29, 198, 79, 145, 258, 307, 30,
74.4, 0, 47.3, 32, 0, 52.6, 95.3, 51.7)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -27L))
Thank you for providing the dput and code - you are using the df dataset, so you don't really need the df_numeric dataset. To get the names of all your numeric columns, you can use the following code: names(df)[unlist(lapply(df, is.numeric))]. Also you assigned a vector of values to the command dv - it should only be the name of the column.
The below should work for you, it did for me:
for (column in names(df)[unlist(lapply(df, is.numeric))]) {
res.aov <- rstatix::anova_test(data = df, dv = column,
wid = `Subject`, within = `Timepoint`, between = `Genotype`)
rstatix::get_anova_table(res.aov)
}
Note that in your loop, you are overwriting res.aov with each iteration and you are not storing the results from get_anova_table(res.aov) - I would suggest storing these data in a list:
nnames <- names(df)[unlist(lapply(df, is.numeric))]
res.aov <- list()
aov_tab <- list()
for (column in nnames) {
res.aov[[column]] <- rstatix::anova_test(data = df, dv = column,
wid = `Subject`, within = `Timepoint`, between = `Genotype`)
aov_tab[[column]] <- rstatix::get_anova_table(res.aov[[column]])
}

How to make hierarchical cluster pheatmap in r?

I have use this code to make hierarchical cluster heatmap but no color is coming
library(tidyverse)
Mydata <- structure(list(Location = c("Karnaphuli River", "Sangu River", "Kutubdia Channel", "Moheshkhali Channel", "Bakkhali River", "Naf River", "St. Martin's Island", "Mean "), Cr = c(114.92, 2.75, 18.88, 27.6, 39.5, 12.8, 17.45, 33.41), Pb = c(31.29, 26.42, 52.3, 59.45, 34.65, 12.8, 9.5, 32.34), Cu = c(9.48, 54.39, 52.4, 73.28, 76.26, 19.48, 8.94, 42.03), Zn = c(66.2, 71.17, 98.7, 95.3, 127.84, 27.76, 21.78, 72.67), As = c(89.67, 9.85, 8.82, 18.54, 15.38, 7.55, 16.45, 23.75), Cd = c(1.06, 0, 0.96, 2.78, 3.12, 0.79, 0.45, 1.53)), class = "data.frame", row.names = c(NA, -8L))
library(pheatmap)
Mydata %>% column_to_rownames(var = "Location") %>%
as.matrix() %>% pheatmap(Mydata, cutree_cols = 6)
You don't need to pass data again when using pipes. Try :
library(pheatmap)
Mydata %>%
column_to_rownames(var = "Location") %>%
as.matrix() %>% pheatmap(cutree_cols = 6)

How to create unique vectors for large dataset

I am trying to find the Atkinson Index measure for individual countries that spans over 11,000 observations. I have the decile measures for each specific observation which I can create an individual vector ex. c(d1, d2,...d10) for each single observation and compute the Atkinson Index but I am sure there is a quicker way to do this across 11,000 observations. Is there any possible way I can direct R to create a unique vectors across all 11,000 observations that use the deciles that are specific to each individual observation?
I am still rather new to coding in R, but I have tried to see if I can create some kind of loop that would return a vector pertaining to the deciles that corresponding with each individual observation.
id2 <- c(3.86, 5.29, 6.38, 7.32, 8.38, 9.35, 10.82, 12.47, 14.90, 21.22)
atkinson(id2, epsilon = 1)
[1] 0.1079504
Here is what I get when type:
dput(head(data))
structure(list(id = c(1, 2, 3, 4, 5, 6), country = c("Afghanistan",
"Albania", "Albania", "Albania", "Albania", "Albania"), c3 = c("AFG",
"ALB", "ALB", "ALB", "ALB", "ALB"), d1 = c(NA, 0, 3.49, 3.48,
3.73, 3.66), d2 = c(NA, 5.29, 4.86, 4.92, 5.14, 5.19), d3 = c(NA,
6.38, 5.84, 5.98, 6.09, 6.14), d4 = c(NA, 7.32, 6.74, 6.92, 6.98,
7.03), d5 = c(NA, 8.38, 7.65, 7.99, 7.91, 8.08), d6 = c(NA, 9.35,
8.84, 9.04, 8.92, 9.26), d7 = c(NA, 10.82, 10.23, 10.37, 10.3,
10.52), d8 = c(NA, 12.47, 11.98, 12.13, 11.93, 12.29), d9 = c(NA,
14.9, 14.93, 14.83, 14.54, 14.89), d10 = c(NA, 21.22, 25.44,
24.34, 24.46, 22.93)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I can do this over 11,000 times but obviously that will take awhile, is there a way to construct R (a loop?) to do something along this lines for each individual observation?
Consider a row-wise calculation with apply to assign a new column to data frame. Underneath, as.vector() combines all decile points into a vector needed for atkinson().
data$atkinson_index <- apply(data[4:ncol(data)], MARGIN=1,
function(x) atkinson(as.vector(x), epsilon = 1)
)
data
Should NA pose a problem, wrap call in tryCatch
data$atkinson_index <- apply(data[4:ncol(data)], MARGIN=1,
function(x) tryCatch(atkinson(as.vector(x), epsilon = 1),
error = function(e) NA)
)
data

Trouble trying to clean a character vector in R data frame (UTF-8 encoding issue)

I'm having some issues cleaning up a dataset after I manually extracted the data online - I'm guessing these are encoding issues. I have an issue trying to remove the "U+00A0" in the "Athlete" column cels along with the operator brackets. I looked up the corresponding UTF-8 code and it's for "No-Break-Space". I'm also not sure how to replace the other UTF-8 characters to make the names legible - for e.g. getting U+008A to display as Š.
Subset of data
head2007decathlon <- structure(list(Rank = 1:6, Athlete = c("<U+00A0>Roman <U+008A>ebrle<U+00A0>(CZE)", "<U+00A0>Maurice Smith<U+00A0>(JAM)", "<U+00A0>Dmitriy Karpov<U+00A0>(KAZ)", "<U+00A0>Aleksey Drozdov<U+00A0>(RUS)", "<U+00A0>Andr<e9> Niklaus<U+00A0>(GER)", "<U+00A0>Aleksey Sysoyev<U+00A0>(RUS)"), Total = c(8676L, 8644L, 8586L, 8475L, 8371L, 8357L), `100m` = c(11.04, 10.62, 10.7, 10.97, 11.12, 10.8), LJ = c(7.56, 7.5, 7.19, 7.25, 7.42, 7.01), SP = c(15.92, 17.32, 16.08, 16.49, 14.12, 16.16), HJ = c(2.12, 1.97, 2.06, 2.12, 2.06, 2.03), `400m` = c(48.8, 47.48, 47.44, 50, 49.4, 48.42), `110mh` = c(14.33, 13.91, 14.03, 14.76, 14.51, 14.59), DT = c(48.75, 52.36, 48.95, 48.62, 44.48, 49.76), PV = c(4.8, 4.8, 5, 5, 5.3, 4.9), JT = c(71.18, 53.61, 59.84, 65.51, 63.28, 57.75), `1500m` = c(275.32, 273.52, 279.68, 276.93, 272.5, 276.16), Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2007", class = "factor"), Nationality = c(NA, NA, NA, NA, NA, NA)), .Names = c("Rank", "Athlete", "Total", "100m", "LJ", "SP", "HJ", "400m", "110mh", "DT", "PV", "JT", "1500m", "Year", "Nationality"), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
This is what I've tried so far to no success:
1) head2007decathlon$Athlete <- gsub(pattern="\U00A0",replacement="",x=head2007decathlon$Athlete)
2) head2007decathlon$Athlete <- gsub(pattern="<U00A0>",replacement="",x=head2007decathlon$Athlete)
3) head2007decathlon$Athlete <- iconv(head2007decathlon$Athlete, from="UTF-8", to="LATIN1")
4) Encoding(head2007decathlon$Athlete) <- "UTF-8"
5) head2007decathlon$Athlete<- enc2utf8(head2007decathlon$Athlete)
The following would remove the no break space.
head2007decathlon$Athlete <- gsub(pattern="<U\\+00A0>",replacement="",x=head2007decathlon$Athlete)
Not sure how to convert the other characters. One problem could be that the codes are not exactly in a format that R sees as UTF-8.
One example:
iconv('\u008A', from="UTF-8", to="LATIN1")
this seems to have an effect, contrary to trying to convert U+008A. Although
the output is:
[1] "\x8a"
not the character you want. Hope this helps somehow.

Having issues using order function in R

My data.frame is stateData and when I execute stateData[order(stateData$"heart failure"),], with heart failure being a column name, I'm getting my dataframe back with the heart failure column having increasing values like this:
10.0, 10.1, 10.3, 10.7, 15.0, 15.1, 15.9, 8.1, 8.3, 8.9, 9.0, 9.1
Here are details:dput(head(stateData))
heart failure = structure(c(97L, 44L, 25L, 6L, 52L, 57L ), .Label = c("10.0", "7.2", "7.3", "7.4", "7.5", "7.6", "7.7", "7.8", "7.9", "8.0", "8.1", "8.2", "8.3", "8.4", "8.5", "8.6", "8.7", "8.8", "8.9", "9.0", "9.1", "9.2", "9.3", "9.4", "9.5", "9.6", "9.7", "9.8", "9.9", "Not Available"), class = "factor"),
Why is it not sorting it all the way?
Any help is appreciated! Thank you!
Edit: Here is my solution! I got it, thanks for all of the advice!
stateData[,"heart failure"] <- as.numeric(levels(stateData["heart failure"])[stateData[,"heart failure"]])
sortedData <- stateData[order(stateData[,"heart failure"]),]
The column stateData$"heart failure" is a factor, so when R sorts it, it puts it in alphabetical order. If you want the data sorted numerically, try this:
stateData$"heart failure" <- as.numeric(levels(stateData$"heart failure"))[stateData$"heart failure"]
stateData[order(stateData$"heart failure"),]

Resources