loocv <- glm(data_set, tree)$delta[1]
haven't received result, error pop up in glm(data_set, tree) : 'family' not recognized
2.
stop("'family' not recognized")
1.
glm(data_set, tree)
So the syntax for glm is as follows:
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = list(…), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, …)
Note that the second argument to this is family. tree is not an acceptable family, the allowed values are:
binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")
Were you trying to fit a tree based model like a CART? If so I recommend using a package like tree or rpart to fit this to your data.
Related
I am using Caret's Train function inside my own function to train multiple models. Because Caret cannot handle the quotation marks in the X input, i tried removing the quotes with base R's 'noquote' function. Because in other parts of the function i need the input with quotation marks, i cannot remove the quotations surrounding the input values beforehand. Thanks in advance!
Code:
i <- "Dax.Shepard"
celeg_lgr = train(noquote(i) ~ ., method = "glm",
family = binomial(link = "logit"), data = celeb_trn,
trControl = trainControl(method = 'cv', number = 5))
Running this code results in the following error:
Error in model.frame.default(form = op ~ ., data = celeb_trn, na.action = na.fail) :
variable lengths differ (found for 'Dax.Shepard')
PS.
Running the code like this does not result in any error:
celeg_lgr = train(Dax.Shepard ~ ., method = "glm",
family = binomial(link = "logit"), data = celeb_trn,
trControl = trainControl(method = 'cv', number = 5))
I have a dataframe that consist of numerical and non-numerical variables. I am trying to fit a logisic regression model predicting my variable "risk" based on all other variables, optimizing AUC using a 6-fold cross validation.
However, I want to center and scale all numerical explanatory variables. My code raises no errors or warning but somehow I fail to figure out how to tell train() through preProcess (or in some other way) to just center and scale my numerical variables.
Here is the code:
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")
You could try to preprocess all numerical variables in original df first and then applying train function over scaled df
library(dplyr)
library(caret)
df <- df %>%
dplyr::mutate_if(is.numeric, scale)
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")
I am using the following code to fit and test a random forest classification model:
> control <- trainControl(method='repeatedcv',
+ number=5,repeats = 3,
+ search='grid')
> tunegrid <- expand.grid(.mtry = (1:12))
> rf_gridsearch <- train(y = river$stat_bino,
+ x = river[,colnames(river) != "stat_bino"],
+ data = river,
+ method = 'rf',
+ metric = 'Accuracy',
+ ntree = 600,
+ importance = TRUE,
+ tuneGrid = tunegrid, trControl = control)
Note, I am using
train(y = river$stat_bino, x = river[,colnames(river) != "stat_bino"],...
rather than: train(stat_bino ~ .,...
so that my categorical variables will not be turned into dummy variables.
solution here: variable encoding in K-fold validation of random forest using package 'caret')
I would like to extract the FinalModel and use it to make partial dependence plots for my variables (using code below), but I get an error message and don't know how to fix it.
> model1 <- rf_gridsearch$finalModel
> library(pdp)
> partial(model1, pred.var = "MAXCL", type = "classification", which.class = "1", plot =TRUE)
Error in eval(stats::getCall(object)$data) :
..1 used in an incorrect context, no ... to look in
Thanks for any solutions here!
I would like to plot an interaction effect for two continuous variables in a multilevel model. The model is the following:
lme4::glmer(final$trust ~ final$educ.c +
final$age.c +
final$gender +
final$C14 RESPONDENT_OCCUPATION_SCALE +
final$tot_country.c +
final$int_use.c +
final$educ.c*final$acc.c +
(final$educ.c | final$Country),
data = final,
family = binomial,
weights = final$W85_WEIGHT_EU27_+_TR_+_HR_+_NO_+_CH_+_IS, nAGQ=0,
control = glmerControl(optimizer = "optimx", calc.derivs = FALSE, optCtrl = list(method = "nlminb", starttests = FALSE, kkt = FALSE)))
The interaction I am interested in is final$educ.c*final$acc.c, the dependent variable is dichotomous and after have consulted various websites and similar issues, I am still struggling with it. Does anyone have a suggestion about how to do it?
This question appears to have been asked before here but was correctly closed as off-topic. I'm now experiencing the same issue and figured that stack overflow is a better place for this issue.
I want to use glmnet's warm start for selecting lambda to speed up the model building process, but I want to keep using tuneGrid from caret in order to supply a large sequence of alpha's (glmnet's default alpha range is too narrow). the following attempt returns the error: Error: The tuning parameter grid should have columns alpha, lambda
fitControl <- trainControl(method = 'cv', number = 10, classProbs = TRUE, summaryFunction = twoClassSummary)
tuneGridb <- expand.grid(.alpha = seq(0, 1, 0.05))
model.caretb <- caret::train(y ~ x1 + x2 + x3, data=train, method="glmnet",
family = "binomial", trControl = fitControl,
tuneGrid = tuneGridb, metric = "ROC")
How can I supply a range of values for alpha via caret whilst using the glmnet default lambda selection process?
If you check the default grid search method for glmnet model in caret
you will notice that if a grid search is specified, but without the actual grid, caret will provide alpha values with:
alpha = seq(0.1, 1, length = len)
while lambda values will be provided by the glmnet "warm start" at alpha = 0.5:
init <- glmnet::glmnet(Matrix::as.matrix(x), y,
family = fam,
nlambda = len+2,
alpha = .5)
lambda <- unique(init$lambda)
lambda <- lambda[-c(1, length(lambda))]
lambda <- lambda[1:min(length(lambda), len)]
so if you do:
library(caret)
library(mlbench)
data(Sonar)
fitControl <- trainControl(method = 'cv',
number = 10,
classProbs = TRUE,
summaryFunction = twoClassSummary,
search = "grid")
model.caret <- caret::train(Class~ .,
data = Sonar,
method="glmnet",
family = "binomial",
trControl = fitControl,
tuneLength = 20,
metric = "ROC")
you will not get a grid of 20 combinations but a grid of 400 combinations, for each alpha 20 lambda values:
nrow(model.caret$results)
#output
400
I understand this is not exactly what you are after but it is pretty close without resorting to a custom train function.
To get closer to the desired result you can manually get the range of lambda values from glmnet for each desired alpha:
lambda <- unique(unlist(lapply(seq(0, 1, 0.05), function(x){
init <- glmnet::glmnet(Matrix::as.matrix(Sonar[,1:60]), Sonar$Class,
family = "binomial",
nlambda = 100,
alpha = x)
lambda <- c(min(init$lambda), max(init$lambda))
}
)))
create a grid of many lambda:
tuneGridb <- expand.grid(.alpha = seq(0, 1, 0.05),
.lambda = seq(min(lambda), max(lambda), length.out = 100))
caret is smart enough just to pass the lambda values to glmnet and not fit all the models
model.caret <- caret::train(Class~ .,
data = Sonar,
method="glmnet",
family = "binomial",
trControl = fitControl,
tuneGrid = tuneGridb,
metric = "ROC")
model.caret$bestTune
#output
alpha lambda
1 0 2.159367e-05
Ridge is the way to go in this case. Since this best lambda was in fact the lowest lambda tested
min(lambda)
#output
2.159367e-05
perhaps it would be wise to explore lower lambda values in the grid than glmnet "warm" start suggested.