I am trying to do an anova anaysis in R on a data set with one within factor and one between factor. The data is from an experiment to test the similarity of two testing methods. Each subject was tested in Method 1 and Method 2 (the within factor) as well as being in one of 4 different groups (the between factor). I have tried using the aov, the Anova(in car package), and the ezAnova functions. I am getting wrong values for every method I try. I am not sure where my mistake is, if its a lack of understanding of R or the Anova itself. I included the code I used that I feel should be working. I have tried a ton of variations of this hoping to stumble on the answer. This set of data is balanced but I have a lot of similar data sets and many are unblanced. Thanks for any help you can provide.
library(car)
library(ez)
#set up data
sample_data <- data.frame(Subject=rep(1:20,2),Method=rep(c('Method1','Method2'),each=20),Level=rep(rep(c('Level1','Level2','Level3','Level4'),each=5),2))
sample_data$Result <- c(4.76,5.03,4.97,4.70,5.03,6.43,6.44,6.43,6.39,6.40,5.31,4.54,5.07,4.99,4.79,4.93,5.36,4.81,4.71,5.06,4.72,5.10,4.99,4.61,5.10,6.45,6.62,6.37,6.42,6.43,5.22,4.72,5.03,4.98,4.59,5.06,5.29,4.87,4.81,5.07)
sample_data[, 'Subject'] <- as.factor(sample_data[, 'Subject'])
#Set the contrats if needed to run type 3 sums of square for unblanaced data
#options(contrats=c("contr.sum","contr.poly"))
#With aov method as I understand it 'should' work
anova_aov <- aov(Result ~ Method*Level + Error(Subject/Method),data=test_data)
print(summary(anova_aov))
#ezAnova method,
anova_ez = ezANOVA(data=sample_data, wid=Subject, dv = Result, within = Method, between=Level, detailed = TRUE, type=3)
print(anova_ez)
Also, the values I should be getting as output by SAS
SAS Anova
Actually, your R code is correct in both cases. Running these data through SPSS yielded the same result. SAS, like SPSS, seems to require that the levels of the within factor appear in separate columns. You will end up with 20 rows instead of 40. An arrangmement like the one below might give you the desired result in SAS:
Subject Level Method1 Method2
Related
I'm trying to conduct a 3 way Anova in R, using the WRS2 package.
My data is heteroskedastic so i need to do a robust version e.g. trimmed means.
I have my data arranged in long form (csv with 4 columns - 3 factors and 1 numeral). My input looks like this:
t3way(happiness ~ money*job*relationship, data = Dataset)
I get the following error: "Incomplete design! It needs to be full factorial!"
Thank you in advance!
I solved the same problem by re-factoring the variable with empty levels.
Dataset$myvar <- factor(Dataset$myvar)
Thus only the levels having data remain. Then the t3way worked fine.
Ok, I have a data frame with 250 observations of 9 variables. For simplicity, let's just label them A - I
I've done all the standard stuff (converting things to int or factor, creating the data partition, test and train sets, etc).
What I want to do is use columns A and B, and predict column E. I don't want to use the entire set of nine columns, just these three when I make my prediction.
I tried only using the limited columns in the prediction, like this:
myPred <- predict(rfModel, newdata=myData)
where rfModel is my model, and myData only contains the two fields I want to use, as a dataframe. Unfortunately, I get the following error:
Error in predict.randomForest(rfModel, newdata = myData) :
variables in the training data missing in newdata
Honestly, I'm very new to R, and I'm not even sure this is feasible. I think the data that I'm collecting (the nine fields) are important to use for "training", but I can't figure out how to make a prediction using just the "resultant" field (in this case field E) and the other two fields (A and B), and keeping the other important data.
Any advice is greatly appreciated. I can post some of the code if necessary.
I'm just trying to learn more about things like this.
A assume you used random forest method:
library(randomForest)
model <- randomForest(E ~ A+ B - c(C,D,F,G,H,I), data = train)
pred <- predict(model, newdata = test)
As you can see in this example only A and B column would be taken to build a model, others are removed from model building (however not removed from the dataset). If you want to include all of them use (E~ .). It also means that if you build your model based on all column you need to have those columns in test set too, predict won't work without them. If the test data have only A and B column the model has to be build based on them.
Hope it helped
As I mentioned in my comment above, perhaps you should be building your model using only the A and B columns. If you can't/don't want to do this, then one workaround perhaps would be to simply use the median values for the other columns when calling predict. Something like this:
myData <- cbind(data[, c("A", "B)], median(data$C), median(data$D), median(data$E),
median(data$F), median(data$G), median(data$H), median(data$I))
myPred <- predict(rfModel, newdata=myData)
This would allow you to use your current model, built with 9 predictors. Of course, you would be assuming average behavior for all predictors except for A and B, which might not behave too differently from a model built solely on A and B.
I'm using lme4 package to run mixed model. I want to extract fixed effect result and random effect result in seperate dataset, so that we can use it for further analysis. But unfortunately I could not.
E.g.
mixed_result<- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)
I tried to extract fixed effect and random effect using the following method:
fixEffect<-fixef(mixed_result)
randEffect<-ranef(mixed_result)
View(fixEffect)
I tried fixef and ranef for fixed effect and random effect respectively and try to create the dataset using the result of it. But it was giving me the following error:
Error in View : cannot coerce class ""ranef.mer"" to a data.frame
I actually want output as we get in SAS , solutionF and solutionR. But in case if it's not possible to get output like that, the coeffs of fixed and random will do.
I'll be grateful if someone can help me.
Thanks and Regards,
Use str to see the structure of an object.
str(fixEffect)
# named vector, can probably be coerced to data.frame
View(as.data.frame(fixEffect))
# works just fine
str(randEffect)
# list of data frames (well, list of one data frame in this case)
View(randEffect$Subject)
If you had, say, slopes that also varied by Subject, they would go in the same Subject data frame as the Subject level intercepts. However, if intercepts also varied by some other variable group, with a different number of level than Subject, they obviously couldn't go in the same data frame. This is why a list of data frames is used, so that the same structure can generalize up for more complex models.
I'm trying to run a series of GLMM's on a large dataset to explore relationships between plant traits and environmental factors for each of several plant species at different research sites using plots and years as random factors in my models. I'm using plyr and I keep getting the following error message:
Error in eval.quoted(.variables, data) :
envir must be either NULL, a list, or an environment.
My data set is in the following format:
Site Plot Species FlowerDate Year Factor FactorValue
1 AD ADC01 CTETB 179 1999 numJulSF 160
And here is the code I am using:
data.list <- dlply(data,c("Species","Site","FlowerDate","Year", "Factor"),
function(df){lmer(FlowerDate~FactorValue+(1|Plot)+(1|Year),
data=df)})
I have seen that others have this issue, but I'm still having difficulty resolving it.
It seems to me that the main problem is that you are splitting the data based on some of the variables that are actually included in the model ('FlowerData' and 'Year'), which does not make sense in principle (no point in including an input variable that does not variable, or modeling an output variable that is constant).
Other than that, the combination of dlply + lmer should work; in fact, I use it quite often without problems...
I have a dataset with about 11,500 rows and 15 factors. I only need to impute values for 3 of the factors, with only 2 of the factors having any significant number of missing values. I have been trying to use mice to create imputed datasets, and I am using the following code:
dataset<-read.csv("filename.csv",header=TRUE)
model<-success~1+course+medium+ethnicity+gender+age+enrollment+HSGPA+GPA+Pell+ethnicity*medium
library(mice)
vempty<-c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
v12<-c(0,0,0,0,0,0,0,1,1,1,1,0,1,1,1)
v13<-c(0,0,0,0,0,0,0,1,1,1,1,1,0,1,1)
v14<-c(0,0,0,0,0,0,0,1,1,1,1,1,1,0,1)
list<-list(vempty,vempty,vempty,vempty,vempty,vempty,vempty,vempty,vempty,vempty,vempty,v12,v13,v14,vempty)
predmatrix<-do.call(rbind,list)
MIdataset<-mice(dataset,m=2,predictorMatrix=predmatrix)
MIoutput<- pool(glm(model, data=MIdataset, family=binomial))
After this code, I get the error message:
Error in as.data.frame.default(data) :
cannot coerce class '"mids"' into a data.frame
I'm totally at a loss as to what this means. I had no trouble doing this same analysis just deleting the missing data and using regular glm. I'd also like to do a multilvel logistic model on imputed datasets using lmer (that's the next step after I get this to work with glm), so if there is anything I am doing wrong that will also impact that next step, that would be good to know, too. I've tried to search this error on the internet, and I'm not getting anywhere. I'm just really learning R, so I'm also not that familiar with the environment yet.
Thanks for your time!
You need to apply the with.mids function. I think the last line in your code should look like this:
pool(with(MIdataset, glm(formula(model), family = binomial)))
You could also try this:
expr <- 'glm(success ~ course, family = binomial)'
pool(with(MIdataset, parse(text = expr)))