Error when applying lsmeans to model - r

I have a dataset (data1) as below:
T,D,P,R
1,0,1,1
2,1,1,3
3,0,1,7
1,0,2,2
2,1,2,3
3,1,2,7
1,0,3,1
2,1,3,4
3,1,3,7
1,1,4,1
2,1,4,3
3,0,4,7
To start with, all variables except the response (which is R) are specified as factors (so D, T and P are factors). I then fit the model:
model1 <- lme(R~D+T,random=~1|P, method="REML",data=data1)
Then I want to use lsmeans to average across factor levels of D and P.
To do this I type:
lsm_model <- lsmeans(model1,~T)
Doing this, I get two error messages:
"Error in [.data.frame(tbl, , vars, drop = FALSE) :
undefined columns selected
And
Error in ref.grid(object = list(modelStruct = list(reStruct = list(Plot = -11.7209195395097)), :
Perhaps a 'data' or 'params' argument is needed"
Why can't I fit lsmeans to this model? I have used lsmeans many times without any problems.

Related

How can I include both my categorical and numeric predictors in my elastic net model? r

As a note beforehand, I think I should mention that I am working with highly sensitive medical data that is protected by HIPAA. I cannot share real data with dput- it would be illegal to do so. That is why I made a fake dataset and explained my processes to help reproduce the error.
I have been trying to estimate an elastic net model in r using glmnet. However, I keep getting an error. I am not sure what is causing it. The error happens when I go to train the data. It sounds like it has something to do with the data type and matrix.
I have provided a sample dataset. Then I set the outcomes and certain predictors to be factors. After setting certain variables to be factors, I label them. Next, I create an object with the column names of the predictors I want to use. That object is pred.names.min. Then I partition the data into the training and test data frames. 65% in the training, 35% in the test. With the train control function, I specify a few things I want to have happen with the model- random paraments for lambda and alpha, as well as the leave one out method. I also specify that it is a classification model (categorical outcome). In the last step, I specify the training model. I write my code to tell it to use all of the predictor variables in the pred.names.min object for the trainingset data frame.
library(dplyr)
library(tidyverse)
library(glmnet),0,1,0
library(caret)
#creating sample dataset
df<-data.frame("BMIfactor"=c(1,2,3,2,3,1,2,1,3,2,1,3,1,1,3,2,3,2,1,2,1,3),
"age"=c(0,4,8,1,2,7,4,9,9,2,2,1,8,6,1,2,9,2,2,9,2,1),
"L_TartaricacidArea"=c(0,1,1,0,1,1,1,0,0,1,0,1,1,0,1,0,0,1,1,0,1,1),
"Hydroxymethyl_5_furancarboxylicacidArea_2"=
c(1,1,0,1,0,0,1,0,1,1,0,1,1,0,1,1,0,1,0,1,0,1),
"Anhydro_1.5_D_glucitolArea"=
c(8,5,8,6,2,9,2,8,9,4,2,0,4,8,1,2,7,4,9,9,2,2),
"LevoglucosanArea"=
c(6,2,9,2,8,6,1,8,2,1,2,8,5,8,6,2,9,2,8,9,4,2),
"HexadecanolArea_1"=
c(4,9,2,1,2,9,2,1,6,1,2,6,2,9,2,8,6,1,8,2,1,2),
"EthanolamineArea"=
c(6,4,9,2,1,2,4,6,1,8,2,4,9,2,1,2,9,2,1,6,1,2),
"OxoglutaricacidArea_2"=
c(4,7,8,2,5,2,7,6,9,2,4,6,4,9,2,1,2,4,6,1,8,2),
"AminopentanedioicacidArea_3"=
c(2,5,5,5,2,9,7,5,9,4,4,4,7,8,2,5,2,7,6,9,2,4),
"XylitolArea"=
c(6,8,3,5,1,9,9,6,6,3,7,2,5,5,5,2,9,7,5,9,4,4),
"DL_XyloseArea"=
c(6,9,5,7,2,7,0,1,6,6,3,6,8,3,5,1,9,9,6,6,3,7),
"ErythritolArea"=
c(6,7,4,7,9,2,5,5,8,9,1,6,9,5,7,2,7,0,1,6,6,3),
"hpresponse1"=
c(1,0,1,1,0,1,1,0,0,1,0,0,1,0,1,1,1,0,1,0,0,1),
"hpresponse2"=
c(1,0,1,0,0,1,1,1,0,1,0,1,0,1,1,0,1,0,1,0,0,1))
#setting variables as factors
df$hpresponse1<-as.factor(df$hpresponse1)
df$hpresponse2<-as.factor(df$hpresponse2)
df$BMIfactor<-as.factor(df$BMIfactor)
df$L_TartaricacidArea<- as.factor(df$L_TartaricacidArea)
df$Hydroxymethyl_5_furancarboxylicacidArea_2<-
as.factor(df$Hydroxymethyl_5_furancarboxylicacidArea_2)
#labeling factor levels
df$hpresponse1 <- factor(df$hpresponse1, labels = c("group1.2", "group3.4"))
df$hpresponse2 <- factor(df$hpresponse2, labels = c("group1.2.3", "group4"))
df$L_TartaricacidArea <- factor(df$L_TartaricacidArea, labels =c ("No",
"Yes"))
df$Hydroxymethyl_5_furancarboxylicacidArea_2 <-
factor(df$Hydroxymethyl_5_furancarboxylicacidArea_2, labels =c ("No",
"Yes"))
df$BMIfactor <- factor(df$BMIfactor, labels = c("<40", ">=40and<50",
">=50"))
#creating list of predictor names
pred.start.min <- which(colnames(df) == "BMIfactor"); pred.start.min
pred.stop.min <- which(colnames(df) == "ErythritolArea"); pred.stop.min
pred.names.min <- colnames(df)[pred.start.min:pred.stop.min]
#partition data into training and test (65%/35%)
set.seed(2)
n=floor(nrow(df)*0.65)
train_ind=sample(seq_len(nrow(df)), size = n)
trainingset=df[train_ind,]
testingset=df[-train_ind,]
#specifying that I want to use the leave one out cross-
#validation method and
use "random" as search for elasticnet
tcontrol <- trainControl(method = "LOOCV",
search="random",
classProbs = TRUE)
#training model
elastic_model1 <- train(as.matrix(trainingset[,
pred.names.min]),
trainingset$hpresponse1,
data = trainingset,
method = "glmnet",
trControl = tcontrol)
After I run the last chunk of code, I end up with this error:
Error in { :
task 1 failed - "error in evaluating the argument 'x' in selecting a
method for function 'as.matrix': object of invalid type "character" in
'matrix_as_dense()'"
In addition: There were 50 or more warnings (use warnings() to see the first
50)
I tried removing the "as.matrix" arguemtent:
elastic_model1 <- train((trainingset[, pred.names.min]),
trainingset$hpresponse1,
data = trainingset,
method = "glmnet",
trControl = tcontrol)
It still produces a similar error.
Error in { :
task 1 failed - "error in evaluating the argument 'x' in selecting a method
for function 'as.matrix': object of invalid type "character" in
'matrix_as_dense()'"
In addition: There were 50 or more warnings (use warnings() to see the first
50)
When I tried to make none of the predictors factors (but keep outcome as factor), this is the error I get:
Error: At least one of the class levels is not a valid R variable name; This
will cause errors when class probabilities are generated because the
variables names will be converted to X0, X1 . Please use factor levels that
can be used as valid R variable names (see ?make.names for help).
How can I fix this? How can I use my predictors (both the numeric and categorical ones) without producing an error?
glmnet does not handle factors well. The recommendation currently is to dummy code and re-code to numeric where possible:
Using LASSO in R with categorical variables

How do you compute average marginal effects for glm.cluster models?

I am looking for a way to compute average marginal effects with clustered standard errors which i seem to be having a few problems with. My model is as follows:
cseLogit <- miceadds::glm.cluster(data = data_long,
formula = follow ~ f1_distance + f2_distance + PolFol + MediaFol,
cluster = "id",
family = binomial(link = "logit"))
Where the dependent variable is binary (0/1) and all explanatory variables are numeric. I've tried to different ways of getting average marginal effects. The first one is:
marginaleffects <- margins(cseLogit, vcov = your_matrix)
Which gives me the following error:
Error in find_data.default(model, parent.frame()) :
'find_data()' requires a formula call
I've also tried this:
marginaleffects <- with(cseLogit, margins(glm_res, vcov=vcov))
which gives me this error:
Error in eval(predvars, data, env) :
object 'f1_distance' was not found
In addition: warnings:
1: In dydx.default(X[[i]], ...) :
Class of variable, f1_distance, is unrecognized. Returning NA.
2: In dydx.default(X[[i]], ...) :
Class of variable, f2_distance, is unrecognized. Returning NA.
Can you tell me what i'm doing wrong? If i haven't provided enough information, please let me know. Thanks in advance.

Fail to predict woe in R

I used this formula to get woe with
library("woe")
woe.object <- woe(data, Dependent="target", FALSE,
Independent="shop_id", C_Bin=20, Bad=0, Good=1)
Then I want to predict woe for the test data
test.woe <- predict(woe.object, newdata = test, replace = TRUE)
And it gives me an error
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "data.frame"
Any suggestions please?
For prediction, you cannot do it with the package woe. You need to use the package. Take note of the masking of the function woe, see below:
#let's say we woe and then klaR was loaded
library(klaR)
data = data.frame(target=sample(0:1,100,replace=TRUE),
shop_id = sample(1:3,100,replace=TRUE),
another_var = sample(letters[1:3],100,replace=TRUE))
#make sure both dependent and independent are factors
data$target=factor(data$target)
data$shop_id = factor(data$shop_id)
data$another_var = factor(data$another_var)
You need two or more dependent variables:
woemodel <- klaR::woe(target~ shop_id+another_var,
data = data)
If you only provide one, you have an error:
woemodel <- klaR::woe(target~ shop_id,
data = data)
Error in woe.default(x, grouping, weights = weights, ...) : All
factors with unique levels. No woes calculated! In addition: Warning
message: In woe.default(x, grouping, weights = weights, ...) : Only
one single input variable. Variable name in resulting object$woe is
only conserved in formula call.
If you want to predict the dependent variable with only one independent, something like logistic regression will work:
mdl = glm(target ~ shop_id,data=data,family="binomial")
prob = predict(mdl,data,type="response")
predicted_label = ifelse(prob>0.5,levels(data$target)[1],levels(data$target)[0])

LME error in model.frame.default ... variable lengths differ

I am trying to run a random effects model with LME. It is part of a larger function and I want it to be flexible so that I can pass the fixed (and ideally random) effects variable names to the lme function as variables. get() worked great for this where I started with lm, but it only seems to throw the ambiguous "Error in model.frame.default(formula = ~var1 + var2 + ID, data = list( : variable lengths differ (found for 'ID')." I'm stumped, the data are the same lengths, there are no NAs in this data or the real data, ...
set.seed(12345) #because I got scolded for not doing this previously
var1="x"
var2="y"
exdat<-data.frame(ID=c(rep("a",10),rep("b",10),rep("c",10)),
x = rnorm(30,100,1),
y = rnorm(30,100,2))
#exdat<-as.data.table(exdat) #because the data are actually in a dt, but that doesn't seem to be the issue
Works great
lm(log(get(var1))~log(get(var2)),data=exdat)
lme(log(y)~log(x),random=(~1|ID), data=exdat)
Does not work
lme(log(get(var1,pos=exdat))~log(get(var2)),random=(~1|ID), data=exdat)
Does not work, but throws a new error code: "Error in model.frame.default(formula = ~var1 + var2 + rfac + exdat, data = list( : invalid type (list) for variable 'exdat'"
rfac="ID"
lme(log(get(var1))~log(get(var2)),random=~1|get(rfac,pos=exdat), data=exdat)
Part of the problem seems to be with the nlme package. If you can consider using lme4, the desired results can be obtained by with:
lme4::lmer(log(get(var1)) ~ log(get(var2)) + (1 | ID),
data = exdat)

Error in R: "Error in tree.control(nobs, ...) : unused argument (type = "class")"

I am building a decision tree using the tree library in R. I have attempted to fit my model as follows:
model <- tree(Outcome ~ Age + Sex + Income, data = train, type = "class")
Running the above line gives me an error as follows:
Error in tree.control(nobs, ...) : unused argument (type = "class")
I down sampled so that each class is equal and so did not specify any weights. If I remove the argument, type = "class", the model runs but when I predict using the model, it seems that it is building a regression model which I do not want.
Can someone help?
If you look at the help page ?tree there is no argument called type. If you are getting a regression, that is because Outcome is a numeric argument. I expect that you can fix this by adding
train$Outcome = factor(train$Outcome)

Resources