Execute ANOVA & Follow-Up Tests on multiple columns - r

I'm attempting to perform a Hinkin and Tracey content validation on potential scale items and have the following dataset (sample) with 74 unique columns:
cleandata <- structure(list(Condition = c("RS", "AS", "BGPS", "APCS", "OP", "TS"
), alt_energy = c(2, 5, 3, 3, 2, 2), animal_product = c(5, 3,
4, 4, 3, 1), deforest = c(5, 1, 4, 1, 2, 1)), row.names = c(NA,
6L), class = c("tbl_df", "tbl", "data.frame"))
Only one column (Condition) is the "dimension," and I want to see if the mean responses for the remaining 73 columns significantly differ between conditions. Basically, I want to see if the scale item successfully reflects only 1 dimension.
I want to run an anova and Tukey HSD on all the columns at once to get everything in one neat output:
test <- aov(as.matrix(cleandata[, -1]) ~ as.factor(Condition), data = cleandata)
summary(test, effect.size = "both", detailed = TRUE, observed = NULL)
But I'm unable to run the HSD follow-up
tukeyHSD(test)
Getting the following error: Error in model.tables.aov(x, "means") :
'model.tables' is not implemented for multiple responses
Is there anyway to loop the HSD so I get one clean output of anova results and follow-up pairwise comparisons?

Related

Using LmFuncs (Linear Regression) in Caret for Recursive Feature Elimination: How do I fix "same number of samples in x and y" error?

I'm new to R and trying to isolate the best performing features from a data set of 247 columns (246 variables + 1 outcome), and 800 or so rows (where each row is one person's data) to create a predictive model.
I'm using caret to do RFE using lmfuncs - I need to use linear regression since the target variable continuous.
I use the following to split into test/training data (which hasn't evoked errors)
inTrain <- createDataPartition(data$targetVar, p = .8, list = F)
train <- data[inTrain, ]
test <- data[-inTrain, ]
The resulting test and train files have even variables within the sets. e.g X and Y contain the same number samples / all columns are the same length
My control parameters are as follows (also runs without error)
control = rfeControl(functions = lmFuncs, method = "repeatedcv", repeats = 5, verbose = F, returnResamp = "all")
But when I run RFE I get an error message saying
Error in rfe.default(train[, -1], train[, 1], sizes = c(10, 15, 20, 25, 30), rfeControl = control) :
there should be the same number of samples in x and y
My code for RFE is as follows, with the target variable in first column
rfe_lm_profile <- rfe(train[, -1], train[, 1], sizes = c(10, 15, 20, 25, 30), rfeControl = control)
I've looked through various forums, but nothing seems to work.
This google.group suggests using an older version of Caret - which I tried, but got the same X/Y error https://groups.google.com/g/rregrs/c/qwcP0VGn4ag?pli=1
Others suggest converting the target variable to a factor or matrix. This hasn't helped, and evokes
Warning message:
In createDataPartition(data$EBI_SUM, p = 0.8, list = F) :
Some classes have a single record
when partitioning the data into test/train, and the same X/Y sample error if you try to carry out RFE.
Mega thanks in advance :)
p.s
Here's the dput for the target variable (EBI_SUM) and a couple of variables
data <- structure(list(TargetVar = c(243, 243, 243, 243, 355, 355), Dosing = c(2,
2, 2, 2, 2, 2), `QIDS_1 ` = c(1, 1, 3, 1, 1, 1), `QIDS_2 ` = c(3,
3, 2, 3, 3, 3), `QIDS_3 ` = c(1, 2, 1, 1, 1, 2)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
>
Your data object should not contain spaces:
library(caret)
data <- data.frame(
TargetVar = c(243, 243, 243, 243, 355, 355),
Dosing = c(2, 2, 2, 2, 2, 2),
QIDS_1 = c(1, 1, 3, 1, 1, 1),
QIDS_2 = c(3, 3, 2, 3, 3, 3),
QIDS_3 = c(1, 2, 1, 1, 1, 2)
)
inTrain <- createDataPartition(data$TargetVar, p = .8, list = F)
train <- data[inTrain, ]
test <- data[-inTrain, ]
control <- rfeControl(functions = lmFuncs, method = "repeatedcv", repeats = 5, verbose = F, returnResamp = "all")
rfe_lm_profile <- rfe(train[, -1], train[, 1], sizes = c(10, 15, 20, 25, 30), rfeControl = control)

Two-way repeated measure Anova R with ezANOVA error, One or more cells is missing data

I've created this minimal dataset for the example :
data_long <- data.frame(Subject = factor(c(1, 2, 3, 1, 2, 3)),
Trt = factor(c("T1","T2","T3","T1","T2","T3")),
Day = factor(c(7, 7, 7, 14, 14, 14)),
Value = c(7.6, 5.3, 8.6, 12.4, 11.2, 11))
But when I try to make a two way repeated measure ANOVA with ezANOVA, I have this error :
m2 <- ezANOVA(data = data_long, dv = Value, wid = Subject, within = c(Day,Trt))
Erreur dans ezANOVA_main(data = data, dv = dv, wid = wid, within = within, :
One or more cells is missing data. Try using ezDesign() to check your data.
I definitely don't have missing data, but this error still occurs. Is there a way to fix that ?
Thank you in advance,
Yemoloh
I think the problem you are having is that for each level of the Trt factor one single participant is present.
You can see this by adding the same participants to each condition (so that each participant is present for each Trt condition):
data_long <- data.frame(Subject = factor(rep(1:3, each = 6)),
Trt = factor(rep(c("T1", "T2", "T3"), times = 6)),
Day = factor(rep(c(7, 14), times = 3, each = 3)),
Value = rnorm(n = 18, mean = 6))
With this data structure you would be able to run the ANOVA as you specified it.

Correlation between variables under the for loop

I have an issue that is shown below. I tried to solve it but was not successful. I have a dataframe df1. I need to make a table of correlation between the variables within a for loop. Reason being I do not want to make the code look long and complicated.
df1 <- structure(list(a = c(1, 2, 3, 4, 5), b = c(3, 5, 7, 4, 3), c = c(3,
6, 8, 1, 2), d = c(5, 3, 1, 3, 5)), class = "data.frame", row.names =
c(NA, -5L))
I tried with the below code using 2 for loops
fv <- as.data.frame(combn(names(df1),2,paste, collapse="&"))
colnames(fv) <- "ColA"
fv$ColB <- sapply(strsplit(fv$ColA,"\\&"),'[',1)
fv$ColC <- sapply(strsplit(fv$ColA,"\\&"),'[',2)
asd <- list()
for (i in fv$ColB) {
for (j in fv$ColC) {
asd[i,j] <- as.data.frame(cor(df1[,i],df1[,j]))}}
May I know what wrong I am doing
We can apply cor directly on the data.frame and convert to 'long' format with melt. As the values in the lower triangular part is the mirror values of those in the upper triangular part, either one of these can be assigned to NA and then do the melt
library(reshape2)
out[lower.tri(out, diag = TRUE)] <- NA
melt(out, na.rm = TRUE)

Loop over non-standard variable names in R

I have a dataframe (df) with variables that look similar to vector-variables:
myvariable[1], myvariable[2] , myvariable[3] , etc.
However, if I want to refer to them, R automatically creates barticks around them:
df$`myvariable[1]`
I want to use those variables within a for-loop, and hence, want to change the number within the brackets automatically. Does anyone know how to do this?
PS: This question is different from other questions insofar as R doesn't see my variables as vector variables but rather as single variables that look the same. Hence, the []-part of my variables is seen as only some kind of string and not as a subsetting operator.
PS2: dput(head(zTT$subjects[, c("myvariable[1]","myvariable[3]","myvariable[4]")],4))
structure(list(\`myvariable[1]\` = c(2, 4, 2, 9), \`myvariable[3]\` = c(1,
1,2, 3), \`myvariable[4]\` = c(2, 4, 2, 7)), .Names = c("myvariable[1]",
"myvariable[3]", "myvariable[4]"), row.names = c(NA, 4L), class = "data.frame")
As akrun has suggested, you can use [[. The code below uses your own data frame to construct the string which corresponds to the list names.
temp <- structure(list(`myvariable[1]` = c(2, 4, 2, 9),
`myvariable[3]` = c(1, 1,2, 3),
`myvariable[4]` = c(2, 4, 2, 7)),
.Names = c("myvariable[1]", "myvariable[3]",
"myvariable[4]"), row.names = c(NA, 4L),
class = "data.frame")
for (i in c(1, 3, 4)) {
myVar <- paste0("myvariable[", i, "]")
print(temp[[myVar]])
}

Trouble using predict with linear model in R [duplicate]

This question already has answers here:
Predict() - Maybe I'm not understanding it
(4 answers)
Closed 6 years ago.
I have a simple problem but can't seem to figure out what I'm doing wrong. I am using predict to estimate values from a fitted linear model. The following code works correctly:
x <- c(1, 2, 3, 4, 5 , 6)
y <- c(1, 4, 9, 16, 25, 36)
model <- lm(y ~ x)
predict(model, newdata = data.frame(x=7))
However, when I use the same data for x and y, but in a dataframe, it does not work. For example,
df2 <- structure(list(x = c(1, 2, 3, 4, 5, 6), y = c(1, 4, 9, 16, 25,36)),
.Names = c("x", "y"), row.names = c(NA, -6L), class = "data.frame")
model <- lm(df2$y ~ df2$x)
predict(model, newdata = data.frame(x=7))
throws the error:
Warning message:
'newdata' had 1 row but variables found have 6 rows
I am using the same exact data and am expecting the same answer. Can anyone tell me what I am doing wrong? Thanks!
Try
model = lm(y ~ x, data = df2)
predict(model, newdata = data.frame(x = 7))
The problem is the way you specified the formula.

Resources