need help extracting multiple p-values from aov - r

I know there are a lot of post about how to extract the p-value from an aov. However, I have a list with several thousand samples. i did an aov for each sample to compare two different treatments and now i am looking for a way to get a list with all the p-values, as i cannot copy them one by one..
is this even possible?
I had no problems doing this for the p-values created by a ttest:
results <- apply(data,1,function(x){t.test(x[1:3],x[4:6])$p.value})
data is my imported .csv and [1:3] indicates the columns that are compared with the columns [4:6]
so that really was not a problem, but it seems not to be possible to do something similar for the aov:
results <- apply(data,1,function(x){aov(x[1:3]~x[4:6])})
i cannot get a list with all the p-values (that are called Pr(>F)..which is kind of frustrating..
hope you understand what i am trying to do,

results <- apply(data,1,function(x){anova(aov(x[1:3]~x[4:6]))[['Pr(>F)']][1]})

Youll probably want lapply if the data is in a list already. And you can use summary to get the p-values from aov
lapply(yourData, function(x){
av <- aov(yourFormula, data = x)
summary(av)[[1]][,5]
})

Related

apply fisher test in a large dataset that join all contingency tables

I have a dataset like this:
contingency_table<-tibble::tibble(
x1_not_happy = c(1,4),
x1_happy = c(19,31),
x2_not_happy = c(1,4),
x2_happy= c(19,28),
x3_not_happy=c(14,21),
X3_happy=c(0,9),
x4_not_happy=c(3,13),
X4_happy=c(17,22)
)
in fact, there are many other variables that come from a poll aplied in two different years.
Then, I apply a Fisher test in each 2X2 contingency matrix, using this code:
matrix1_prueba <- contingency_table[1:2,1:2]
matrix2_prueba<- contingency_table[1:2,3:4]
fisher1<-fisher.test(matrix1_prueba,alternative="two.sided",conf.level=0.9)
fisher2<-fisher.test(matrix2_prueba,alternative="two.sided",conf.level=0.9)
I would like to run this task using a short code by mean of a function or a loop. The output must be a vector with the p_values of each questions.
Thanks,
Frederick
So this was a bit of fun to do. The main thing that you need to recognize is that you want combinations of your data. There are a number of functions in R that can do that for you. The main workhorse is combn() Link
So in the language of the problem, we want all combinations of your tibble taken 2 at a time link2
From there, you just need to do some looping structure to get your tests to work, and extract the p-values from the object.
list_tables <- lapply(combn(contingency_table,2,simplify=F), fisher.test)
unlist(lapply(list_tables, `[`, 'p.value'))
This should produce your answer.
EDIT
Given the updated requirements for just adjacement data.frame columns, the following modifications should work.
full_list <- combn(contingency_table,2,simplify=F)
full_list <- full_list[sapply(
full_list, function(x) all(startsWith(names(x), substr(names(x)[1], 1,2))))]
full_list <- lapply(full_list, fisher.test)
unlist(lapply(full_list, `[`, 'p.value'))
This is approximately the same code as before, but now we have to find the subsets of the data that have the same question prefix name. This only works if the prefixes are exactly the same (X3 != x3). I think this is a better solution than trying to work with column indexes, and without the guarantee of always being next to one another. The sapply code does just that. The final output should be what you need for the problem.

Mixed Anova in R

I am trying to do an anova anaysis in R on a data set with one within factor and one between factor. The data is from an experiment to test the similarity of two testing methods. Each subject was tested in Method 1 and Method 2 (the within factor) as well as being in one of 4 different groups (the between factor). I have tried using the aov, the Anova(in car package), and the ezAnova functions. I am getting wrong values for every method I try. I am not sure where my mistake is, if its a lack of understanding of R or the Anova itself. I included the code I used that I feel should be working. I have tried a ton of variations of this hoping to stumble on the answer. This set of data is balanced but I have a lot of similar data sets and many are unblanced. Thanks for any help you can provide.
library(car)
library(ez)
#set up data
sample_data <- data.frame(Subject=rep(1:20,2),Method=rep(c('Method1','Method2'),each=20),Level=rep(rep(c('Level1','Level2','Level3','Level4'),each=5),2))
sample_data$Result <- c(4.76,5.03,4.97,4.70,5.03,6.43,6.44,6.43,6.39,6.40,5.31,4.54,5.07,4.99,4.79,4.93,5.36,4.81,4.71,5.06,4.72,5.10,4.99,4.61,5.10,6.45,6.62,6.37,6.42,6.43,5.22,4.72,5.03,4.98,4.59,5.06,5.29,4.87,4.81,5.07)
sample_data[, 'Subject'] <- as.factor(sample_data[, 'Subject'])
#Set the contrats if needed to run type 3 sums of square for unblanaced data
#options(contrats=c("contr.sum","contr.poly"))
#With aov method as I understand it 'should' work
anova_aov <- aov(Result ~ Method*Level + Error(Subject/Method),data=test_data)
print(summary(anova_aov))
#ezAnova method,
anova_ez = ezANOVA(data=sample_data, wid=Subject, dv = Result, within = Method, between=Level, detailed = TRUE, type=3)
print(anova_ez)
Also, the values I should be getting as output by SAS
SAS Anova
Actually, your R code is correct in both cases. Running these data through SPSS yielded the same result. SAS, like SPSS, seems to require that the levels of the within factor appear in separate columns. You will end up with 20 rows instead of 40. An arrangmement like the one below might give you the desired result in SAS:
Subject Level Method1 Method2

Output for large correlation matrices in R

From what I've seen, R cannot very easily produce usable output for large correlation matrices (50-100 variables). For instance, "corr.test" or "cor" output is horrendously wrapped (each variable should have only one row and one column, but this is certainly not the case) and does not copy well into Excel for later examination. Is there a way to produce SPSS-like correlation output in R? Namely, correlation matrices that can be copied and pasted easily into something like Excel, where each row and each column pertains to one variable (no wrapping of text), and ideally, sample-sizes and significance values are somehow available. Corr.test provides this information, albeit in an inconvenient format, and when variables exceed output viewer space in R, the output is basically unreadable. Any thoughts would be greatly, greatly appreciated, as I'm frequently working with many variables at once.
Is there anything wrong with
z <- matrix(rnorm(10000),100)
write.csv(cor(z),file="cortmp.csv")
? View(cor(z)) works for me, although I don't know if it's copy-and-pasteable.
For psych::corr.test
dimnames(z) <- list(1:100,1:100)
z[1,2] <- NA ## unbalance to induce sample size matrix
ct <- psych::corr.test(z)
write.csv(ct$n,file="ntmp.csv") ## sample sizes
write.csv(ct$t,file="ttmp.csv") ## t statistics
write.csv(ct$p,file="ptmp.csv") ## p-values
et cetera. (See str(ct).)
R's paradigm is that if you want to transfer information to another program you're going to output it to a file rather than copying and pasting it from the console ...

Performing statistics on multiple columns of data

I'm trying to conduct certain statistics such as t-tests on a table of data containing hundreds to thousands of columns. The data is formatted in a way that the two groups of values I'm comparing are in the same column.
So, basically my first attempt was to cut and paste like the following;
NN <-read.delim("E:/output.txt")
View(NN)
attach(NN)
#output p-values of 100 t-tests
sink(file="E:/ttest.txt", append=TRUE, split=FALSE)
t.test(Tree1[1:13],Tree1[14:34])$p.value
t.test(Tree2[1:13],Tree2[14:34])$p.value
t.test(Tree3[1:13],Tree3[14:34])$p.value
....
...
..
.
As my data grows, this is becoming more and more impractical. Is there a way to loop these t-tests through each column sequentially and save the ouput to file?
Thanks in advance.
lapply will get you there I think with an anonymous function:
> test <- data.frame(a=1:100,b=101:200)
> lapply(test,function(x) t.test(x[1:50],x[51:100])$p.value)
$a
[1] 2.876776e-31
$b
[1] 2.876776e-31
I should do my part for good practice and also note that running 100 t-tests in a single go is fraught with the potential for type-1 errors and other badness.
Extracting the p-value in isolation is also probably a really bad move.
Not sure if this is a wise approach or if it even works correctly but try mapply with the indexed parts as in:
test <- data.frame(a=1:100,b=101:200)
testa <- test[1:50, ]
testb <- test[51:100, ]
t.test2 <- function(x, y) t.test(x, y)[["p.value"]]
mapply(t.test2, testa, testb)
EDIT: I used thelatemail's data so it's comparable. His warning is right on.
Thanks for all the input. Just a few clarifications; while I AM running hundreds of t-tests at once, they are comparing independent sets of data each time. So for example, the values in column 1 (Tree1), rows 1:50 would only be compared once to rows 51:100 in the same column, and never used again. The same for column 2 (Tree2), and so on. Would type-1 error still be a problem? the way I see it I'm basically doing t-tests on separate data sets one at a time.
That being said, I've come up with a way to do this with a for-loop, and the results correspond to those when t-testing each column individually.
for (i in 1:100)
print (t.test(mydata[1:50, i],mydata[51:100, i])$p.value)
end;
The only problem being that my output always has a [1] in front of it.

Using paste() within a summary() call for linear regression models

For each of 100 data sets, I am using lm() to generate 7 different equations and would like to extract and compare the p-values and adjusted R-squared values.
Kindly assume that lm() is in fact the best regression technique possible for this scenario.
In searching the web I've found a number of useful examples for how to create a function that will extract this information and write it elsewhere, however, my code uses paste() to label each of the functions by the data source, and I can't figure out how to include these unique pasted names in the function I create.
Here's a mini-example:
temp <- data.frame(labels=rep(1:10),LogPre= rnorm(10))
temp$labels2<-temp$labels^2
testrun<-c("XX")
for (i in testrun)
{
assign(paste(i,"test",sep=""),lm(temp$LogPre~temp$labels))
assign(paste(i,"test2",sep=""),lm(temp$LogPre~temp$labels2))
}
I would then like to extract the coefficients of each equation
But the following doesn't work:
summary(paste(i,"test",sep="")$coefficients)
and neither does this:
coef(summary(paste(i,"test",sep="")))
Both generating the error :$ operator is invalid for atomic vectors
EVEN THOUGH
summary(XXtest)$coefficients
and
coef(summary(XXtest))
work just fine.
How can I use paste() within summary() to allow me to do this for AAtest, AAtest2, ABtest, ABtest2, etc.
Thanks!
Hard to tell exactly what your purpose is, but some kind of apply loop may do what you want in a simpler way. Perhaps something like this?
temp <- data.frame(labels=rep(1:10),LogPre= rnorm(10))
temp$labels2<-temp$labels^2
testrun<-c("XX")
names(testrun) <- testrun
out <- lapply(testrun, function(i) {
list(test1=lm(temp$LogPre~temp$labels),
test2=lm(temp$LogPre~temp$labels2))
})
Then to get all the p-values for the slopes you could do:
> sapply(out, function(i) sapply(i, function(x) coef(summary(x))[2,4]))
XX
test1 0.02392516
test2 0.02389790
Just using paste results in a character string, not the object with that name. You need to tell R to get the object with that name by using get.
summary(get(paste(i,"test",sep="")))$coefficients

Resources