I am very new to R, and this is my first time of encountering the eval() function. So I am trying to use the med and boot.med function from the following package: mma. I am using it to conduct mediation analysis. med and boot.med take in models such as linear models, and dataframes that specify mediators and predictors and then estimate the mediation effect of each mediator.
The author of the package gives the flexible option of specifying one's own custom.function. From the source code of med, it can be seen that the custom.function is passed to the eval(). So I tried insert the gbmt function as the custom function. However, R kept giving me error message: Error during wrapup: Number of trees to be used in prediction must be provided. I have been searching online for days and tried many ways of specifying the number of trees parameter n.trees, but nothing works (I believe others have raised similar issues: post 1, post 2).
The following codes are part of the source code of the med function:
cf1 = gsub("responseY", "y[,j]", custom.function[j])
cf1 = gsub("dataset123", "x2", cf1)
cf1 = gsub("weights123", "w", cf1)
full.model[[j]] <- eval(parse(text = cf1))
One custom function example the author gives in the package documentation is as follows:
temp1<-med(data=data.bin,n=2,custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
Here the glm is the custom function. This example code works and you can replicate it easily (if you have mma installed and loaded). However when I am trying to use the gbmt function on a survival object, I got errors and here is what my code looks like:
temp1 <- med(data = data.surv,n=2,type = "link",
custom.function = 'gbmt(responseY ~.,
data = dataset123,
distribution = dist,
train_params = start_stop,
cv_folds=10,
keep_gbm_data = TRUE,
)')
Anyone has any idea how the argument about number of trees n.trees can be added somewhere in the above code?
Many thanks in advance!
Update: in order to replicate the example code, please install mma and try the following:
library("mma")
data("weight_behavior") ##binary x #binary y
x=weight_behavior[,c(2,4:14)]
pred=weight_behavior[,3]
y=weight_behavior[,15]
data.bin<-data.org(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10), binref=c(1,1),catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4)
temp1<-med(data=data.bin,n=2) #or use self-defined final function
temp1<-med(data=data.bin,n=2, custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
I changed the custom.function to gbmt and used a survival object as responseY and the error occurs. When I use the gbmt function on my data outside the med function, there is no error.
Related
I want to increase the sample size being considering in a power analysis I'm running using simr. With my pilot data of 5 participants, I am able to run the power analysis, but when I use the extend function to increase the number of subjects to 20, I am getting: Error in (function (classes, fdef, mtable): unable to find an inherited method for function ‘extend’ for signature ‘"lmerModLmerTest"’. The extend function does not seem to be working on my model.
I get the same error using the following code, taken from an example online:
#load in the data
sleep_df = lme4::sleepstudy %>%
clean_names()
#set up the model
y_var = "reaction"
fixed_effect = "days"
random_effect = "subject"
model_form = as.formula(paste0(y_var, " ~ ", fixed_effect, " + ", "(1|", random_effect, ")"))
print(model_form)
#run simulation
set.seed(1)
sleep_fit = lmer(model_form,
data = sleep_df)
model_form2 <- extend(sleep_fit, along="subject", n=20)
model_form2
Any insight would be appreciated!
At the top of my head, I can think of two possible errors:
Your subject variable is not specified as an integer, but as a factor. extend() only works on linear variables. However, since you reproduce the error with an example known to work, I think we can disregard this.
The problem is not with the data, but with your R session. For example, if you load after simr another package that has a function that is also named extend() then the function simr::extend() will be masked by the second one. This should show up when you load the package, a message like The following object is masked from 'package:simr' would be printed in the terminal. To solve this, either specify simr::extend() when you want to use this function, or change the order in which you load your packages.
Hope that helps somehow.
I have been trying to run an example code for supervised kohonen SOMs from https://clarkdatalabs.github.io/soms/SOM_NBA . When I tried to predict test set data I got the following error:
pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing)
Error in FUN(X[[i]], ...) :
Data type not allowed: should be a matrix or a factor
I tried newdata = as.matrix(NBA.testing) but it did not help. Neither did as.factor().
Why does it happen? And how can I fix that?
You should put one more argument to the predict function, i.e. "whatmap", then set its value to 1.
The code would be like:
pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing, whatmap = 1)
To verify the prediction result, you can check using:
table(NBA$Pos[-training_indices], pos.prediction$predictions[[2]], useNA = 'always')
The result may be different from that of the tutorial, since it did not declare the use of set.seed() function.
I suggest that the set.seed() with an arbitrary number in it was declared somewhere before the training phase.
For simplicity, put it once on the top most of your script, e.g.
set.seed(12345)
This will guarantee a reproducible result of your model next time you re-run your script.
Hope that will help.
I'm trying to bag conditional inference trees following the advice of Kuhn et al in 'Applied Predictive Modeling', Ch.8:
Conditional inference trees can also be bagged using the cforest function > in the party package if the argument mtry is equal to the number of
predictors:
library(party)
The mtry parameter should be the number of predictors (the
number of columns minus 1 for the outcome).
bagCtrl <- cforest_control(mtry = ncol(trainData) - 1)
baggedTree <- cforest(y ~ ., data = trainData, controls = bagCtrl)
Note there may be a typo in the above code (and also in the package's help file), as discussed here:
R package 'partykit' unused argument in ctree_control
However when I try to replicate this code using a dataframe (and trainData in above code is also a dataframe) such that there is more than one independent/predictor variable, I'm getting an error though it works for just one independent variable:
Some dummy code for simulations:
library(party)
df = data.frame(y = runif(5000), x = runif(5000), z = runif(5000))
bagCtrl <- cforest_control(mtry = ncol(df) - 1)
baggedTree_cforest <- cforest(y ~ ., data = df, control = bagCtrl)
The error message is:
Error: $ operator not defined for this S4 class
Thanks for any help.
As suggested, posting my comment from above as an answer as a general R 'trick' if something expected doesn't work and the program has several libraries loaded:
but what solved it was adding the party namespace explicitly to the function > call, so party::cforest() instead of just cforest(). I've also got
library(partykit) loaded in my actual program which too has a cforest()
function and the error could be stemming from there though both functions are > essentially the same
caret::train() is another example where this often pops up
AIM: The aim here was to find a suitable fit, using step functions, which uses age to describe wage, in the Wage dataset in the library ISLR.
PLAN:
To find a suitable fit, I'll try multiple fits, which will have different cut points. I'll use the glm() function (of the boot library) for the fitting purpose. In order to check which fit is the best, I'll use the cv.glm() function to perform cross-validation over the fitted model.
PROBLEM:
In order to do so, I did the following:
all.cvs = rep(NA, 10)
for (i in 2:10) {
lm.fit = glm(wage~cut(Wage$age,i), data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
But this gives an error:
Error in model.frame.default(formula = wage ~ cut(Wage$age, i), data =
list( : variable lengths differ (found for 'cut(Wage$age, i)')
Whereas, when I run the code given below, it runs.(It can be found here)
all.cvs = rep(NA, 10)
for (i in 2:10) {
Wage$age.cut = cut(Wage$age, i)
lm.fit = glm(wage~age.cut, data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
Hypotheses and Results:
Well, it might be possible that cut() and glm() might not work together. But this works:
glm(wage~cut(age,4),data=Wage)
Question:
So, basically we're using the cut() function, saving it's results in a variable, then using that variable in the glm() function. But we can't put the cut function inside the glm() function. And that too, only if the code is in a loop.
So, why is the first version of the code not working?
This is confusing. Any help appreciated.
I'm trying to write a function for a task I need to do many times (running a cox proportional hazards function over multiple imputed datasets). When I pass the necessary objects to my user-defined function, however, it gives an error, stating that the object cannot be found. I think this is because the object is defined within a dataframe that is specified with the "data=" argument within the cch() function. Can anyone help me with this?
Example data:
my.list<-list(my.df1 <- data.frame(my.id = 1:100, my.time = rlnorm(100),
my.event= c(rbinom(50,1,0.2),rep(1,50)), my.det=rbinom(100,1,0.5),
sub= c(rep(1,50), rbinom(50, 1, 0.1))), my.df2 <- data.frame(my.id = 1:100,
my.time = rlnorm(100), my.event= c(rbinom(50,1,0.2),rep(1,50)),
my.det=rbinom(100,1,0.5), sub= c(rep(1,50), rbinom(50, 1, 0.1))))
Outside my user-defined function, this works:
library(KMsurv)
library(survival)
cch(Surv(my.time,my.event)~as.factor(my.det), data=my.df1, subcoh=~sub,
id=~my.id, cohort.size=500)
However, this does not work (this is an example function, not the real function as the real function is more complex and runs analyses on multiple datasets, then combines them):
myfun<-function(dflist,time,event){
for (i in 1:length(dflist)){
out<-cch(Surv(time,event)~as.factor(my.det), data=dflist[[i]],
subcoh=~sub, id=~my.id, cohort.size=500)
print(out)}
}
myfun(my.list,my.time,my.event)
I get this error: "Error in Surv(time, event) : object 'my.time' not found".
I found some posts about using an eval(substitute()) function to deal with a similar problem, but I can't get it to work. Any suggestions are greatly appreciated!
Try this. You need to keep in mind that R doesn't know what's my.time and my.event. You have to parse them with quotes and then unqoute them in order to parse it into Surv
myfun<-function(dflist,time,event){
for (i in 1:length(dflist)){
time <- noquote(time)
event <- noquote(event)
out<-cch(Surv(dflist[[i]][, time], dflist[[i]][, event])~as.factor(my.det), data=dflist[[i]],
subcoh=~sub, id=~my.id, cohort.size=500)
print(out)}
}
myfun(my.list,"my.time","my.event")