I'm having some trouble extracting information from the ICtab() function of the bbmle package. Essentially what I'm trying to do is run this function on a series of glm models, then add that output to a master data.frame object. However, while I can extract the $dqAIC and $df parameters from the ICtab() output, I cannot figure out a way to extract the row names themselves (i.e. the names of the models that are being input into ICtab). This is an issue because the ICtab() output is ordered in ascending order of $dqAIC - as such, I cannot pre-label a list or data.frame or matrix with the correct order, as the resulting $dqAIC values are not known ahead of time. To compound the problem, the ICtab() object class does not seem to be able to be coerced into a data.frame or any other object where I might be able to extract row.names() or anything similar.
What I'm looking for is a way to extract all the information from the ICtab() function, as a whole or in 3 pieces (row names, dqAIC values, and df values), and then append it to a master table along with some other information.
Below is a sample of the code I'm trying, along with some test data.
library(bbmle)
library(visreg)
library(splines)
library(foreign)
library(survival)
library(lubridate)
dfun<- function(object) {with(object, sum((weights*residuals^2)[weights>0])/df.residual)}
test.data.1 <- seq(1, 1000, by = 10)
num.days <- seq(1, 100, by = 1)
disp.global <- glm(test.data.1 ~ num.days, family=poisson(link="log"), na.action=na.exclude)
model.1 <- glm(test.data.1 ~ ns(num.days, df = 3), family=poisson(link="log"), na.action=na.exclude)
model.2 <- glm(test.data.1 ~ ns(num.days, df = 6), family=poisson(link="log"), na.action=na.exclude)
testIC <- ICtab(model.1, model.2, dispersion=dfun(disp.global),type="qAIC")
Which gives the result:
> testIC
dqAIC df
model.2 0 7
model.1 5 4
I can pull the dqAIC and df values:
> testIC$dqAIC
[1] 0.000000 5.018875
> testIC$df
[1] 7 4
But I cannot figure out a way to get the "model.2" and "model.1" row names; row.names(testIC) returns nothing, and rownames(testIC) simply returns a NULL:
> row.names(testIC)
> rownames(testIC)
NULL
And as far as I can tell, there is no way to change this output using list(), as.data.frame(), data.frame(), or any other object type to get these row names.
> as.data.frame(testIC)
Error in as.data.frame.default(testIC) :
cannot coerce class ""ICtab"" to a data.frame
As a side note, in the documentation for the bbmle package, there appears to be a function called get.mnames() that should do exactly this - list the model names - however, it does not appear to be included in the bbmle package that is installed (my version matches the version of the documentation, 1.0.18):
> ls("package:bbmle")
[1] "AIC" "AICc" "AICctab" "AICtab" "anova" "BICtab" "call.to.char" "coef" "confint" "deviance"
[11] "formula" "ICtab" "logLik" "mle2" "namedrop" "parnames" "parnames<-" "plot" "predict" "profile"
[21] "qAIC" "qAICc" "relist2" "residuals" "sbeta" "sbetabinom" "sbinom" "simulate" "slice" "slice1D"
[31] "slice2D" "sliceOld" "snbinom" "snorm" "spois" "stdEr" "summary" "update" "vcov"
Any help getting these row names out of the ICtab() result would be greatly appreciated. The above code is simply a sample - what I'm actually doing is running multiple models, with a series of datasets, through the ICtab() function, and I want to put all of that information together in one data.frame object as the result.
Thanks in advance,
Nate
I had the same problem as yours, and I can see that nobody replied to your post.
I am not proud of my solution, it is not very elegant, but it works
class(testIC) <- "data.frame"
rownames(testIC)
I hope it would help someone, someday.
trantsyx' solution actually works fine. It can be combined with the convenient table2office commands from {export} package. Works perfect for me.
Related
i try to add a space into a layername of a rasterstack
names(predstack)[[1]] <- "MSR670 max"
> names(predstack)[[1]]
[1] "MSR670.max"
I know this is stupid, but i calculated a model which took 7 days for calculation. Unfortunaly one name of the model variables contains a space in the name.
Now i want predict on the stack and that is not working, because the following Error appears:
> Prediction2model <- raster::predict(predstack, var2model)
Error in predict.randomForest(modelFit, newdata) :
variables in the training data missing in newdata
> names(predstack)
[1] "MSR670.max" "GLI201809_means"
[3] "MSR670201809_sd" "MVI201805_max"
> var2model$selectedvars
[1] "MSR670 max" "GLI201809_means"
[3] "MSR670201809_sd" "MVI201805_max"
So my Question is: How is it possible to add a space character to a raster layer name?
Or is it possible to change the variable name in the random forest model afterwards?
Thank you for any ideas!
The reason why names(predstack)[[1]] <- "MSR670 max" is not working as intended for you is the way the names() function is implemented in the raster package.
Here you can see the method that is applied when calling the names() function on an object of class RasterStack. In line 60 the method calls the validNames() function which is defined here.
validNames() in turn relies on a function called make.names() which basically ensures that there are no white spaces and other unwanted characters within a string (e.g. your raster name).
So looking at the way the names method assigns names to objects of class raster or rasterStack gives a hint on how we might be able to circumvent the issue.
TLDR:
predstack[[1]]#data#names <- "MSR670 max"
should assign the name directly to the raster without using the names() function and the implemented method for it.
Since it's not intended to assign raster names that way there may be problems occurring down the line with this approach, but it might be just enough in your specific situation.
In case someone has the same problem, here is the solution for the problem when there is a space in the model variable names.
It is not possible to change the names in the model. As #Eike showed above, raster layer names can be changed, but they will be changed again by the raster::predict() function the same way.
The only solution is to extract the raster and then predict on the dataframe. Then you can take the values of the dataframe into a raster and have the prediction of the stack as a raster.
Here the solution based on the given example
name <- names(predstack)
varname <- var2model$selectedvars
#initialize progressbar
pb = txtProgressBar(min = 0, max = length(predstack), initial = 0, style = 3)
#extracting rasterdata
for (i in 1:length(predstack)){
setTxtProgressBar(pb,i)
#extracting raster values
print(paste0("Starting extracting raster data: ", Sys.time()))
df <- as.data.frame(predstack[[i]])
colnames(df) <- name[[i]]
if(i == 1){
obs <- df
}
else{
print(paste0("Finished extracting raster data: ", Sys.time()))
obs <- cbind(obs,df)
}
}
colnames(obs)[[1]] <- varname[1]
pred_stack_df <- predict(var2model, obs)
predraster <- predstack[[1]]
values(predraster) <- pred_stack_df
I've just ran a mixed ANOVA using ezANOVA and I need to create a data frame with the output for extraction into an Rmd but I cannot find any information on how to do it.
I've previously used aov() and broom::tidy(), however tidy() cannot format the output I get from the ezANOVA. I've tried as.data.frame but it results is a very messy data frame so I'd rather not use it. Does anybody know of a solution which gives an easy to read data frame similar to tidy()?
My ANOVA:
library(ez)
aov <- b <- ezANOVA(data=exp1.long,
dv=consensus,
wid=participant_id,
within=trait,
between=age_group,
type=3,
detailed=T
)
Have you tried adding return_aov = TRUE?
b <- ezANOVA(data=exp1.long,
dv=consensus,
wid=participant_id,
within=trait,
between=age_group,
type=3,
detailed=T
return_aov = TRUE
)
The return_aov command, if set on TRUE, computes and returns an aov object corresponding to the requested ANOVA (useful for computing post-hoc contrasts).
See for details:
https://rdrr.io/cran/ez/man/ezANOVA.html
The trick is:
df <- as.data.frame(print(aov))
We can then check the resulting df class to confirm it worked correctly:
class(df)
"data.frame"
And check the output (note: used my own data):
df
ANOVA.Effect ANOVA.DFn ANOVA.DFd ANOVA.F ANOVA.p ANOVA.p..05 ANOVA.ges
2 COND1 1 53 4.1938628 0.01947959 * 0.0070548612
3 COND2 1 53 3.6018758 0.02962809 * 0.0040817987
4 COND1:COND2 1 53 0.8371797 0.24026453 0.0008646178
Explanation: It is true that the return_aov = TRUE argument adds an aov object, which can provide similar results when used as summary(aov$aov). However, the result is still not a dataframe, and trying as.data.frame(summary(aov$aov)) outputs Error in as.data.frame.default(summary(aov$aov)) : cannot coerce class ‘"summary.aovlist"’ to a data.frame.
Hopefully, this provides the outcome you were looking for.
I'm trying to extract OLS coefficient from a set of model that run under lapply:
The problem that not all submodel from my list have all levels and most of the time I end up with coefficients "out of bound". For example in the code below "No Reward" option is not present for Reward2 and result0$coef[3,1] will be out of bound as there is no third estimate reported.
Question: is it possible to force "lm" to report all the coefficients specified in the model even if there is no estimate available?
I would like to apologize to the community for not presenting a reproducible code on my earlier attempt. Since my last attempt I solved the problem by checking within the function for presence of a particular estimate, but the question still remains and here is the code:
RewardList<-c("Reward1","Reward2")
set.seed(1234)
GL<-rnorm(10)
RewardName<-rep(c("Reward1","Reward2"),each=5)
NoReward<-c(0,1,0,1,0, 0,0,0,0,0)
Under1<- c(1,0,0,0,1, 0,1,0,1,0)
Above1<- c(0,0,1,0,0, 1,0,1,0,1)
tinput<-as.data.frame(RewardName)
tinput<-cbind(tinput,GL,NoReward,Under1,Above1)
regMF <- lapply(seq_along(RewardList),
function (n) {
tinput <- tinput[tinput$RewardName==RewardList[n],]
result0 <- summary(lm(GL~NoReward+Under1+Above1-1,tinput))
result1 <- result0$coef[1,1] #no rebate
result2 <- result0$coef[2,1]
result3 <- result0$coef[3,1]
return(list(result1,result2,result3))})
To make your function more robust, you could use tryCatch or use dplyr::failwith. For instance, replacing
result0 <- summary(lm(GL~`No Reward`+`Under 1%`-1,tinput))
with
require(dplyr)
result0 <- failwith(NULL, summary(lm(GL~`No Reward`+`Under 1%`-1,tinput)))
may work, though it is difficult to tell without a reproducable example. On how to produce a reproducable example, please have a look here.
Although not exactly your question, I would like to point out a potentially easier way to arrange your data using broom, which avoids the list of lists-output structure of your approach. With broom and dplyr, you can collect the output of your models in a dataframe for easier access. For instance, have a look at the output of
library(dplyr)
library(broom)
mtcars %>% group_by(gear) %>% do(data.frame(tidy(lm(mpg~cyl, data=.), conf.int=T)))
Here, you can wrap the lm function around failwith as well.
As I said, use lmList:
library(nlme)
fits <- lmList(GL~NoReward+Under1+Above1-1 | RewardName, data = tinput)
coef(fits)
# NoReward Under1 Above1
#Reward1 -1.034134 -0.3889705 1.0844412
#Reward2 NA -0.5695960 -0.3102046
I've been working on a project for a little bit for a homework assignment and I've been stuck on a logistical problem for a while now.
What I have at the moment is a list that returns 10000 values in the format:
[[10000]]
X-squared
0.1867083
(This is the 10000th value of the list)
What I really would like is to just have the chi-squared value alone so I can do things like create a histogram of the values.
Is there any way I can do this? I'm fine with repeating the test from the start if necessary.
My current code is:
nsims = 10000
for (i in 1:nsims) {cancer.cells <- c(rep("M",24),rep("B",13))
malig[i] <- sum(sample(cancer.cells,21)=="M")}
benign = 21 - malig
rbenign = 13 - benign
rmalig = 24 - malig
for (i in 1:nsims) {test = cbind(c(rbenign[i],benign[i]),c(rmalig[i],malig[i]))
cancerchi[i] = chisq.test(test,correct=FALSE) }
It gives me all I need, I just cannot perform follow-up analysis on it such as creating a histogram.
Thanks for taking the time to read this!
I'll provide an answer at the suggestion of #Dr. Mike.
hist requires a vector as input. The reason that hist(cancerchi) will not work is because cancerchi is a list, not a vector.
There a several ways to convert cancerchi, from a list into a format that hist can work with. Here are 3 ways:
hist(as.data.frame(unlist(cancerchi)))
Note that if you do not reassign cancerchi it will still be a list and cannot be passed directly to hist.
# i.e
class(cancerchi)
hist(cancerchi) # will still give you an error
If you reassign, it can be another type of object:
(class(cancerchi2 <- unlist(cancerchi)))
(class(cancerchi3 <- as.data.frame(unlist(cancerchi))))
# using the ldply function in the plyr package
library(plyr)
(class(cancerchi4 <- ldply(cancerchi)))
these new objects can be passed to hist directly
hist(cancerchi2)
hist(cancerchi3[,1]) # specify column because cancerchi3 is a data frame, not a vector
hist(cancerchi4[,1]) # specify column because cancerchi4 is a data frame, not a vector
A little extra information: other useful commands for looking at your objects include str and attributes.
I'm trying to run a regression for every zipcode in my dataset and save the coefficients to a data frame but I'm having trouble.
Whenever I run the code below, I get a data frame called "coefficients" containing every zip code but with the intercept and coefficient for every zipcode being equal to the results of the simple regression lm(Sealed$hhincome ~ Sealed$square_footage).
When I run the code as indicated in Ranmath's example at the link below, everything works as expected. I'm new to R after many years with STATA, so any help would be greatly appreciated :)
R extract regression coefficients from multiply regression via lapply command
library(plyr)
Sealed <- read.csv("~/Desktop/SEALED.csv")
x <- function(df) {
lm(Sealed$hhincome ~ Sealed$square_footage)
}
regressions <- dlply(Sealed, .(Sealed$zipcode), x)
coefficients <- ldply(regressions, coef)
Because dlply takes a ... argument that allows additional arguments to be passed to the function, you can make things even simpler:
dlply(Sealed,.(zipcode),lm,formula=hhincome~square_footage)
The first two arguments to lm are formula and data. Since formula is specified here, lm will pick up the next argument it is given (the relevant zipcode-specific chunk of Sealed) as the data argument ...
You are applying the function:
x <- function(df) {
lm(Sealed$hhincome ~ Sealed$square_footage)
}
to each subset of your data, so we shouldn't be surprised that the output each time is exactly
lm(Sealed$hhincome ~ Sealed$square_footage)
right? Try replacing Sealed with df inside your function. That way you're referring to the variables in each individual piece passed to the function, not the whole variable in the data frame Sealed.
The issue is not with plyr but rather in the definition of the function. You are calling a function, but not doing anything with the variable.
As an analogy,
myFun <- function(x) {
3 * 7
}
> myFun(2)
[1] 21
> myFun(578)
[1] 21
If you run this function on different values of x, it will still give you 21, no matter what x is. That is, there is no reference to x within the function. In my silly example, the correction is obvious; in your function above, the confusion is understandable. The $hhincome and $square_footage should conceivably serve as variables.
But you want your x to vary over what comes before the $. As #Joran correctly pointed out, swap sealed$hhincome with df$hhincome (and same for $squ..) and that will help.