I've just ran a mixed ANOVA using ezANOVA and I need to create a data frame with the output for extraction into an Rmd but I cannot find any information on how to do it.
I've previously used aov() and broom::tidy(), however tidy() cannot format the output I get from the ezANOVA. I've tried as.data.frame but it results is a very messy data frame so I'd rather not use it. Does anybody know of a solution which gives an easy to read data frame similar to tidy()?
My ANOVA:
library(ez)
aov <- b <- ezANOVA(data=exp1.long,
dv=consensus,
wid=participant_id,
within=trait,
between=age_group,
type=3,
detailed=T
)
Have you tried adding return_aov = TRUE?
b <- ezANOVA(data=exp1.long,
dv=consensus,
wid=participant_id,
within=trait,
between=age_group,
type=3,
detailed=T
return_aov = TRUE
)
The return_aov command, if set on TRUE, computes and returns an aov object corresponding to the requested ANOVA (useful for computing post-hoc contrasts).
See for details:
https://rdrr.io/cran/ez/man/ezANOVA.html
The trick is:
df <- as.data.frame(print(aov))
We can then check the resulting df class to confirm it worked correctly:
class(df)
"data.frame"
And check the output (note: used my own data):
df
ANOVA.Effect ANOVA.DFn ANOVA.DFd ANOVA.F ANOVA.p ANOVA.p..05 ANOVA.ges
2 COND1 1 53 4.1938628 0.01947959 * 0.0070548612
3 COND2 1 53 3.6018758 0.02962809 * 0.0040817987
4 COND1:COND2 1 53 0.8371797 0.24026453 0.0008646178
Explanation: It is true that the return_aov = TRUE argument adds an aov object, which can provide similar results when used as summary(aov$aov). However, the result is still not a dataframe, and trying as.data.frame(summary(aov$aov)) outputs Error in as.data.frame.default(summary(aov$aov)) : cannot coerce class ‘"summary.aovlist"’ to a data.frame.
Hopefully, this provides the outcome you were looking for.
Related
I'm having some trouble extracting information from the ICtab() function of the bbmle package. Essentially what I'm trying to do is run this function on a series of glm models, then add that output to a master data.frame object. However, while I can extract the $dqAIC and $df parameters from the ICtab() output, I cannot figure out a way to extract the row names themselves (i.e. the names of the models that are being input into ICtab). This is an issue because the ICtab() output is ordered in ascending order of $dqAIC - as such, I cannot pre-label a list or data.frame or matrix with the correct order, as the resulting $dqAIC values are not known ahead of time. To compound the problem, the ICtab() object class does not seem to be able to be coerced into a data.frame or any other object where I might be able to extract row.names() or anything similar.
What I'm looking for is a way to extract all the information from the ICtab() function, as a whole or in 3 pieces (row names, dqAIC values, and df values), and then append it to a master table along with some other information.
Below is a sample of the code I'm trying, along with some test data.
library(bbmle)
library(visreg)
library(splines)
library(foreign)
library(survival)
library(lubridate)
dfun<- function(object) {with(object, sum((weights*residuals^2)[weights>0])/df.residual)}
test.data.1 <- seq(1, 1000, by = 10)
num.days <- seq(1, 100, by = 1)
disp.global <- glm(test.data.1 ~ num.days, family=poisson(link="log"), na.action=na.exclude)
model.1 <- glm(test.data.1 ~ ns(num.days, df = 3), family=poisson(link="log"), na.action=na.exclude)
model.2 <- glm(test.data.1 ~ ns(num.days, df = 6), family=poisson(link="log"), na.action=na.exclude)
testIC <- ICtab(model.1, model.2, dispersion=dfun(disp.global),type="qAIC")
Which gives the result:
> testIC
dqAIC df
model.2 0 7
model.1 5 4
I can pull the dqAIC and df values:
> testIC$dqAIC
[1] 0.000000 5.018875
> testIC$df
[1] 7 4
But I cannot figure out a way to get the "model.2" and "model.1" row names; row.names(testIC) returns nothing, and rownames(testIC) simply returns a NULL:
> row.names(testIC)
> rownames(testIC)
NULL
And as far as I can tell, there is no way to change this output using list(), as.data.frame(), data.frame(), or any other object type to get these row names.
> as.data.frame(testIC)
Error in as.data.frame.default(testIC) :
cannot coerce class ""ICtab"" to a data.frame
As a side note, in the documentation for the bbmle package, there appears to be a function called get.mnames() that should do exactly this - list the model names - however, it does not appear to be included in the bbmle package that is installed (my version matches the version of the documentation, 1.0.18):
> ls("package:bbmle")
[1] "AIC" "AICc" "AICctab" "AICtab" "anova" "BICtab" "call.to.char" "coef" "confint" "deviance"
[11] "formula" "ICtab" "logLik" "mle2" "namedrop" "parnames" "parnames<-" "plot" "predict" "profile"
[21] "qAIC" "qAICc" "relist2" "residuals" "sbeta" "sbetabinom" "sbinom" "simulate" "slice" "slice1D"
[31] "slice2D" "sliceOld" "snbinom" "snorm" "spois" "stdEr" "summary" "update" "vcov"
Any help getting these row names out of the ICtab() result would be greatly appreciated. The above code is simply a sample - what I'm actually doing is running multiple models, with a series of datasets, through the ICtab() function, and I want to put all of that information together in one data.frame object as the result.
Thanks in advance,
Nate
I had the same problem as yours, and I can see that nobody replied to your post.
I am not proud of my solution, it is not very elegant, but it works
class(testIC) <- "data.frame"
rownames(testIC)
I hope it would help someone, someday.
trantsyx' solution actually works fine. It can be combined with the convenient table2office commands from {export} package. Works perfect for me.
I am trying to do an anova anaysis in R on a data set with one within factor and one between factor. The data is from an experiment to test the similarity of two testing methods. Each subject was tested in Method 1 and Method 2 (the within factor) as well as being in one of 4 different groups (the between factor). I have tried using the aov, the Anova(in car package), and the ezAnova functions. I am getting wrong values for every method I try. I am not sure where my mistake is, if its a lack of understanding of R or the Anova itself. I included the code I used that I feel should be working. I have tried a ton of variations of this hoping to stumble on the answer. This set of data is balanced but I have a lot of similar data sets and many are unblanced. Thanks for any help you can provide.
library(car)
library(ez)
#set up data
sample_data <- data.frame(Subject=rep(1:20,2),Method=rep(c('Method1','Method2'),each=20),Level=rep(rep(c('Level1','Level2','Level3','Level4'),each=5),2))
sample_data$Result <- c(4.76,5.03,4.97,4.70,5.03,6.43,6.44,6.43,6.39,6.40,5.31,4.54,5.07,4.99,4.79,4.93,5.36,4.81,4.71,5.06,4.72,5.10,4.99,4.61,5.10,6.45,6.62,6.37,6.42,6.43,5.22,4.72,5.03,4.98,4.59,5.06,5.29,4.87,4.81,5.07)
sample_data[, 'Subject'] <- as.factor(sample_data[, 'Subject'])
#Set the contrats if needed to run type 3 sums of square for unblanaced data
#options(contrats=c("contr.sum","contr.poly"))
#With aov method as I understand it 'should' work
anova_aov <- aov(Result ~ Method*Level + Error(Subject/Method),data=test_data)
print(summary(anova_aov))
#ezAnova method,
anova_ez = ezANOVA(data=sample_data, wid=Subject, dv = Result, within = Method, between=Level, detailed = TRUE, type=3)
print(anova_ez)
Also, the values I should be getting as output by SAS
SAS Anova
Actually, your R code is correct in both cases. Running these data through SPSS yielded the same result. SAS, like SPSS, seems to require that the levels of the within factor appear in separate columns. You will end up with 20 rows instead of 40. An arrangmement like the one below might give you the desired result in SAS:
Subject Level Method1 Method2
I'm trying to extract OLS coefficient from a set of model that run under lapply:
The problem that not all submodel from my list have all levels and most of the time I end up with coefficients "out of bound". For example in the code below "No Reward" option is not present for Reward2 and result0$coef[3,1] will be out of bound as there is no third estimate reported.
Question: is it possible to force "lm" to report all the coefficients specified in the model even if there is no estimate available?
I would like to apologize to the community for not presenting a reproducible code on my earlier attempt. Since my last attempt I solved the problem by checking within the function for presence of a particular estimate, but the question still remains and here is the code:
RewardList<-c("Reward1","Reward2")
set.seed(1234)
GL<-rnorm(10)
RewardName<-rep(c("Reward1","Reward2"),each=5)
NoReward<-c(0,1,0,1,0, 0,0,0,0,0)
Under1<- c(1,0,0,0,1, 0,1,0,1,0)
Above1<- c(0,0,1,0,0, 1,0,1,0,1)
tinput<-as.data.frame(RewardName)
tinput<-cbind(tinput,GL,NoReward,Under1,Above1)
regMF <- lapply(seq_along(RewardList),
function (n) {
tinput <- tinput[tinput$RewardName==RewardList[n],]
result0 <- summary(lm(GL~NoReward+Under1+Above1-1,tinput))
result1 <- result0$coef[1,1] #no rebate
result2 <- result0$coef[2,1]
result3 <- result0$coef[3,1]
return(list(result1,result2,result3))})
To make your function more robust, you could use tryCatch or use dplyr::failwith. For instance, replacing
result0 <- summary(lm(GL~`No Reward`+`Under 1%`-1,tinput))
with
require(dplyr)
result0 <- failwith(NULL, summary(lm(GL~`No Reward`+`Under 1%`-1,tinput)))
may work, though it is difficult to tell without a reproducable example. On how to produce a reproducable example, please have a look here.
Although not exactly your question, I would like to point out a potentially easier way to arrange your data using broom, which avoids the list of lists-output structure of your approach. With broom and dplyr, you can collect the output of your models in a dataframe for easier access. For instance, have a look at the output of
library(dplyr)
library(broom)
mtcars %>% group_by(gear) %>% do(data.frame(tidy(lm(mpg~cyl, data=.), conf.int=T)))
Here, you can wrap the lm function around failwith as well.
As I said, use lmList:
library(nlme)
fits <- lmList(GL~NoReward+Under1+Above1-1 | RewardName, data = tinput)
coef(fits)
# NoReward Under1 Above1
#Reward1 -1.034134 -0.3889705 1.0844412
#Reward2 NA -0.5695960 -0.3102046
I'm trying to run a regression for every zipcode in my dataset and save the coefficients to a data frame but I'm having trouble.
Whenever I run the code below, I get a data frame called "coefficients" containing every zip code but with the intercept and coefficient for every zipcode being equal to the results of the simple regression lm(Sealed$hhincome ~ Sealed$square_footage).
When I run the code as indicated in Ranmath's example at the link below, everything works as expected. I'm new to R after many years with STATA, so any help would be greatly appreciated :)
R extract regression coefficients from multiply regression via lapply command
library(plyr)
Sealed <- read.csv("~/Desktop/SEALED.csv")
x <- function(df) {
lm(Sealed$hhincome ~ Sealed$square_footage)
}
regressions <- dlply(Sealed, .(Sealed$zipcode), x)
coefficients <- ldply(regressions, coef)
Because dlply takes a ... argument that allows additional arguments to be passed to the function, you can make things even simpler:
dlply(Sealed,.(zipcode),lm,formula=hhincome~square_footage)
The first two arguments to lm are formula and data. Since formula is specified here, lm will pick up the next argument it is given (the relevant zipcode-specific chunk of Sealed) as the data argument ...
You are applying the function:
x <- function(df) {
lm(Sealed$hhincome ~ Sealed$square_footage)
}
to each subset of your data, so we shouldn't be surprised that the output each time is exactly
lm(Sealed$hhincome ~ Sealed$square_footage)
right? Try replacing Sealed with df inside your function. That way you're referring to the variables in each individual piece passed to the function, not the whole variable in the data frame Sealed.
The issue is not with plyr but rather in the definition of the function. You are calling a function, but not doing anything with the variable.
As an analogy,
myFun <- function(x) {
3 * 7
}
> myFun(2)
[1] 21
> myFun(578)
[1] 21
If you run this function on different values of x, it will still give you 21, no matter what x is. That is, there is no reference to x within the function. In my silly example, the correction is obvious; in your function above, the confusion is understandable. The $hhincome and $square_footage should conceivably serve as variables.
But you want your x to vary over what comes before the $. As #Joran correctly pointed out, swap sealed$hhincome with df$hhincome (and same for $squ..) and that will help.
Training data is read in from two files--one with the independent variables only (df.train) and one with the actual corresponding class values only (df.churn). These values are -1 and 1 only. I then remove all-NA columns and remove duplicate columns in there are any found.
I assemble the two sets of data into a single dataframe with independent and class values, and run naiveBayes() without and errors.
Using the model produced by naiveBayes, I run predict() and note that the output with type = "raw" looks like reasonable data--in most cases those probabilities are relatively close to 0 or 1. I show the first 6 elements below.
I'm looking for the actual predicted class values for input into prediction() with a view to getting an ROC plot and an AUC value. I run predict() again with type = "class", and this is where I get basically nothing at all.
df.train <- read.csv('~/projects/kdd_analysis/data/train_table.csv', header=TRUE, sep=',')
df.churn <- read.csv('~/projects/kdd_analysis/data/sm_churn_labels.csv', header=TRUE, sep=',')
df.train <- df.train[,colSums(is.na(df.train))<nrow(df.train)]
df.train <- df.train[!duplicated(lapply(df.train,c))]
df.train_C <- cbind(df.train, df.churn)
mod_C <- naiveBayes(V1~., df.train_C, laplace=0.01)
pre_C <- predict(mod_C, df.train ,type="raw", threshold=0.001)
I'm running predict() against the training data intentionally because I thought that would be interesting. Below, the values out of predict() seem 'reasonable' to me...that is, they at least don't seem like complete nonsense. I have not compared them to the actuals yet, and would expect to use the explicit class values given by predict() to do that.
head(pre_C)
-1 1
[1,] 9.996934e-01 3.066321e-04
[2,] 9.005501e-07 9.999991e-01
[3,] 1.000000e+00 3.468739e-11
[4,] 9.362914e-01 6.370858e-02
[5,] 9.854649e-01 1.453510e-02
[6,] 9.997680e-01 2.320003e-04
So, this is predict() run again against the identical model--I don't understand how it's possible for it to return nothing:
> pre_C <- predict(mod_C, df.train ,type="class", threshold=0.001)
> pre_C
factor(0)
Levels:
The solution is to coerce the column of class variables to type factor:
df.train_C$V1 <- factor(df.train_C$V1)
then run the model and predict() as before. I changed nothing else and this one mod 'fixed' the issue. Courtesy Andy Liaw at r-help.