Getting variables names from glmnet lasso into a data.frame

Getting variables names from glmnet lasso into a data.frame - r

I'm working with a phyloseq object ps.scale and trying to get the most important variables/features that can predict health status sample_data(ps.scale)$group.
Code is as follows:
library(glmnet)
metadata <- factor(sample_data(ps.scale)$group)
otu_tab <- otu_table(ps.scale)
otu_tab <- apply(otu_tab, 2, function(x) x+1/sum(x+1))
otu_tab <- t(log10(otu_tab))
y <- metadata
x <- otu_tab
lasso <- cv.glmnet(x, y, family="multinomial", alpha=1)
print(lasso)
plot(lasso)
So I get the results and a plot here.
#Call: cv.glmnet(x = x, y = y, family = "multinomial", alpha = 1)
#Measure: Multinomial Deviance
# Lambda Index Measure SE Nonzero
#min 0.03473 36 1.704 0.05392 68
#1se 0.05529 26 1.751 0.05474 16
Now I want to be able to extract the important variables/features (i.e., OTUs). Below are some codes I gathered from the internet:
Code 1
all_1se <- coef(lasso, s = "lambda.1se")
chosen_1se <- all_1se[all_1se > 0, ]
chosen_1se
#Error: 'list' object cannot be coerced to type 'double'
Code 2
tmp_coeffs <- coef(lasso, s = "lambda.1se")
data.frame(name = tmp_coeffs#Dimnames[[1]][tmp_coeffs#i + 1], coefficient = tmp_coeffs#x)
#Error in data.frame(name = tmp_coeffs#Dimnames[[1]][tmp_coeffs#i + 1], :
# trying to get slot "Dimnames" from an object of a basic class ("list") with no slots
Code 3
myCoefs <- coef(lasso, s="lambda.min");
myCoefs[which(myCoefs != 0 ) ]
myCoefs#Dimnames[[1]][which(myCoefs != 0 ) ] #feature names: intercept included
## Asseble into a data.frame
myResults <- data.frame(
features = myCoefs#Dimnames[[1]][ which(myCoefs != 0 ) ], #intercept included
coefs = myCoefs [ which(myCoefs != 0 ) ] #intercept included
)
myResults
#Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'which': 'list' object cannot be coerced to type 'double'
#3.h(simpleError(msg, call))
#2..handleSimpleError(function (cond)
# .Internal(C_tryCatchHelper(addr, 1L, cond)), "'list' object cannot be coerced to type 'double'",
# base::quote(which(myCoefs != 0)))
#1.which(myCoefs != 0)
I need help fixing the above errors, mainly 'list' object cannot be coerced to type 'double'.
Thank you in advance.

Related

Error in glmnet if I specify a variable to be a factor

I have a database in R where I would like to perform a glmnet task. The y variable consists on an originally numeric variable which however takes on only 0 and 1 values. If I specify the latter to be a factor variable as follows
df_ML_1976[,names] <- lapply(df_ML_1976[,names] , factor)
and then apply glmnet after dividing into training and test set:
library("dplyr")
df_ML_1976 %>%
select(where(~ any(. != 0)))
#df_ML_1976 <- subset(df_ML_1976, select = -c(X))
library("caret")
default_idx = createDataPartition(df_ML_1976$y_tr4, p = 0.75, list = FALSE)
default_trn = df_ML_1976[default_idx, ]
default_tst = df_ML_1976[-default_idx, ]
## Fitting elasticnet:
cv_5 = trainControl(method = "cv", number = 5)
def_elnet = train(
y_tr4 ~ ., data = default_trn,
method = "glmnet",
trControl = cv_5
)
def_elnet
an error occurs:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'drop': non-conformable arguments
which does not appear if I do not specify
df_ML_1976[,names] <- lapply(df_ML_1976[,names] , factor)
why is it like so?
Thank you

R error: all arguments must have the same length

I got an error when I'm doing naive Bayes by R, here's my code and error
library(e1071)
#data
train_data <- read.csv('https://raw.githubusercontent.com/JonnyyJ/data/master/train.csv',header=T)
test_data <- read.csv('https://raw.githubusercontent.com/JonnyyJ/data/master/test.csv',header=T)
efit <- naiveBayes(y~job+marital+education+default+contact+month+day_of_week+
poutcome+age+pdays+previous+cons.price.idx+cons.conf.idx+euribor3m
,train_data)
pre <- predict(efit, test_data)
bayes_table <- table(pre, test_data[,ncol(test_data)])
accuracy_test_bayes <- sum(diag(bayes_table))/sum(bayes_table)
list('predict matrix'=bayes_table, 'accuracy'=accuracy_test_bayes)
ERROR:
bayes_table <- table(pre, test_data[,ncol(test_data)])
Error in table(pre, test_data[, ncol(test_data)]) :
all arguments must have the same length
accuracy_test_bayes <- sum(diag(bayes_table))/sum(bayes_table)
Error in diag(bayes_table) : object 'bayes_table' not found
list('predict matrix'=bayes_table, 'accuracy'=accuracy_test_bayes)
Error: object 'bayes_table' not found
I really don't understand what's going on, because I'm new in R

For some reason, the default predict(efit, test_data, type = "class") doesn't work in this case (probably because your model predicts 0 for all observations in the test dataset). You also need to construct the table using your outcome (i.e. test_data[,ncol(test_data)] returns euribor3m). The following should work:
pre <- predict(efit, test_data, type = "raw") %>%
as.data.frame() %>%
mutate(prediction = if_else(0 < 1, 0, 1)) %>%
pull(prediction)
bayes_table <- table(pre, test_data$y)
accuracy_test_bayes <- sum(diag(bayes_table)) / sum(bayes_table)
list('predict matrix' = bayes_table, 'accuracy' = accuracy_test_bayes)
# $`predict matrix`
#
# pre 0 1
# 0 7282 956
#
# $accuracy
# [1] 0.8839524

Error 'Non-numeric argument to mathematical function' when run ScheffeTest with aov object

I ran the following code (taken from here):
set.seed(123)
Njk <- 10
P <- 2
Q <- 2
R <- 3
DV_t1 <- rnorm(P*Q*Njk, -3, 2)
DV_t2 <- rnorm(P*Q*Njk, 1, 2)
DV_t3 <- rnorm(P*Q*Njk, 2, 2)
dfSPFpq.rL <- data.frame(id=factor(rep(1:(P*Q*Njk), times=R)),
IVbtw1=factor(rep(1:P, times=Q*R*Njk)),
IVbtw2=factor(rep(rep(1:Q, each=P*Njk), times=R)),
IVwth=factor(rep(1:R, each=P*Q*Njk)),
DV=c(DV_t1, DV_t2, DV_t3))
aovSPFpq.r <- aov(DV ~ IVbtw1*IVbtw2*IVwth + Error(id/IVwth), data=dfSPFpq.rL)
summary(aovSPFpq.r)
Now I want to run ScheffeTest:
I tried:
library(DescTools)
ScheffeTest.aov(aovSPFpq.r)
Error in pf(psi^2/(MSE * sscoeff * dfgrp), df1 = dfgrp, df2 = dferr, lower.tail = FALSE) :
Non-numeric argument to mathematical function
ScheffeTest(aovSPFpq.r)
Error in model.frame.default(formula = x ~ g, drop.unused.levels = TRUE) :
invalid type (list) for variable 'x'
What is the correct way to run ScheffeTest with aov object?

Error using random forest (MICE package) during imputation

I would like to use the method Random Forest to impute missing values. I have read some papers that claim that MICE random Forest perform better than parametric mice.
In my case, I already run a model for the default mice and got the results and played with them. However when I had a option for the method random forest, I got an error and I'm not sure why. I've seen some questions relating to errors with random forest and mice but those are not my cases. My variables have more than a single NA.
imp <- mice(data1, m=70, pred=quickpred(data1), method="pmm", seed=71152, printFlag=TRUE)
impRF <- mice(data1, m=70, pred=quickpred(data1), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
Any one has any idea why I'm getting this error?
EDIT
I tried to change all variables to numeric instead of having dummy variables and it returned the same error and some warnings()
impRF <- mice(data, m=70, pred=quickpred(data), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac CliForm
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
In addition: There were 50 or more warnings (use warnings() to see the first 50)
50: In randomForest.default(x = xobs, y = yobs, ntree = 1, ... :
The response has five or fewer unique values. Are you sure you want to do regression?
EDIT1
I've tried only with 5 imputations and a smaller subset of the data, with only 2000 rows and got a few different errors:
> imp <- mice(data2, m=5, pred=quickpred(data2), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac Radio Origin Job Alc Smk Drugs Prison Commu Hmless Symp
Error in randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs in foreign
function call (arg 11)
In addition: Warning messages:
1: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : invalid mtry: reset to within valid range
2: In max(ncat) : no non-missing arguments to max; returning -Inf
3: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs introduced by coercion

I also encountered this error when I had only one fully observed variable, which I'm guessing is the cause in your case too. My colleague Anoop Shah provided me with a fix (below) and Prof van Buuren (mice's author) has said he will include it in the next update of the package.
In R type the following to enable you to redefine the rf impute function.
fixInNamespace("mice.impute.rf", "mice")
The corrected function to paste in is then:
mice.impute.rf <- function (y, ry, x, ntree = 100, ...){
ntree <- max(1, ntree)
xobs <- as.matrix(x[ry, ])
xmis <- as.matrix(x[!ry, ])
yobs <- y[ry]
onetree <- function(xobs, xmis, yobs, ...) {
fit <- randomForest(x = xobs, y = yobs, ntree = 1, ...)
leafnr <- predict(object = fit, newdata = xobs, nodes = TRUE)
nodes <- predict(object = fit, newdata = xmis, nodes = TRUE)
donor <- lapply(nodes, function(s) yobs[leafnr == s])
return(donor)
}
forest <- sapply(1:ntree, FUN = function(s) onetree(xobs,
xmis, yobs, ...))
impute <- apply(forest, MARGIN = 1, FUN = function(s) sample(unlist(s),
1))
return(impute)
}

formula error inside function

I want use survfit() and basehaz() inside a function, but they do not work. Could you take a look at this problem. Thanks for your help. The following code leads to the error:
library(survival)
n <- 50 # total sample size
nclust <- 5 # number of clusters
clusters <- rep(1:nclust,each=n/nclust)
beta0 <- c(1,2)
set.seed(13)
#generate phmm data set
Z <- cbind(Z1=sample(0:1,n,replace=TRUE),
Z2=sample(0:1,n,replace=TRUE),
Z3=sample(0:1,n,replace=TRUE))
b <- cbind(rep(rnorm(nclust),each=n/nclust),rep(rnorm(nclust),each=n/nclust))
Wb <- matrix(0,n,2)
for( j in 1:2) Wb[,j] <- Z[,j]*b[,j]
Wb <- apply(Wb,1,sum)
T <- -log(runif(n,0,1))*exp(-Z[,c('Z1','Z2')]%*%beta0-Wb)
C <- runif(n,0,1)
time <- ifelse(T<C,T,C)
event <- ifelse(T<=C,1,0)
mean(event)
phmmd <- data.frame(Z)
phmmd$cluster <- clusters
phmmd$time <- time
phmmd$event <- event
fmla <- as.formula("Surv(time, event) ~ Z1 + Z2")
BaseFun <- function(x){
start.coxph <- coxph(x, phmmd)
print(start.coxph)
betahat <- start.coxph$coefficient
print(betahat)
print(333)
print(survfit(start.coxph))
m <- basehaz(start.coxph)
print(m)
}
BaseFun(fmla)
Error in formula.default(object, env = baseenv()) : invalid formula
But the following function works:
fit <- coxph(fmla, phmmd)
basehaz(fit)

It is a problem of scoping.
Notice that the environment of basehaz is:
environment(basehaz)
<environment: namespace:survival>
meanwhile:
environment(BaseFun)
<environment: R_GlobalEnv>
Therefore that is why the function basehaz cannot find the local variable inside the function.
A possible solution is to send x to the top using assign:
BaseFun <- function(x){
assign('x',x,pos=.GlobalEnv)
start.coxph <- coxph(x, phmmd)
print(start.coxph)
betahat <- start.coxph$coefficient
print(betahat)
print(333)
print(survfit(start.coxph))
m <- basehaz(start.coxph)
print(m)
rm(x)
}
BaseFun(fmla)
Other solutions may involved dealing with the environments more directly.

I'm following up on #moli's comment to #aatrujillob's answer. They were helpful so I thought I would explain how it solved things for me and a similar problem with the rpart and partykit packages.
Some toy data:
N <- 200
data <- data.frame(X = rnorm(N),W = rbinom(N,1,0.5))
data <- within( data, expr = {
trtprob <- 0.4 + 0.08*X + 0.2*W -0.05*X*W
Trt <- rbinom(N, 1, trtprob)
outprob <- 0.55 + 0.03*X -0.1*W - 0.3*Trt
Outcome <- rbinom(N,1,outprob)
rm(outprob, trtprob)
})
I want to split the data to training (train_data) and testing sets, and train the classification tree on train_data.
Here's the formula I want to use, and the issue with the following example. When I define this formula, the train_data object does not yet exist.
my_formula <- Trt~W+X
exists("train_data")
# [1] FALSE
exists("train_data", envir = environment(my_formula))
# [1] FALSE
Here's my function, which is similar to the original function. Again,
badFunc <- function(data, my_formula){
train_data <- data[1:100,]
ct_train <- rpart::rpart(
data= train_data,
formula = my_formula,
method = "class")
ct_party <- partykit::as.party(ct_train)
}
Trying to run this function throws an error similar to OP's.
library(rpart)
library(partykit)
bad_out <- badFunc(data=data, my_formula = my_formula)
# Error in is.data.frame(data) : object 'train_data' not found
# 10. is.data.frame(data)
# 9. model.frame.default(formula = Trt ~ W + X, data = train_data,
# na.action = function (x) {Terms <- attr(x, "terms") ...
# 8. stats::model.frame(formula = Trt ~ W + X, data = train_data,
# na.action = function (x) {Terms <- attr(x, "terms") ...
# 7. eval(expr, envir, enclos)
# 6. eval(mf, env)
# 5. model.frame.rpart(obj)
# 4. model.frame(obj)
# 3. as.party.rpart(ct_train)
# 2. partykit::as.party(ct_train)
# 1. badFunc(data = data, my_formula = my_formula)
print(bad_out)
# Error in print(bad_out) : object 'bad_out' not found
Luckily, rpart() is like coxph() in that you can specify the argument model=TRUE to solve these issues. Here it is again, with that extra argument.
goodFunc <- function(data, my_formula){
train_data <- data[1:100,]
ct_train <- rpart::rpart(
data= train_data,
## This solved it for me
model=TRUE,
##
formula = my_formula,
method = "class")
ct_party <- partykit::as.party(ct_train)
}
good_out <- goodFunc(data=data, my_formula = my_formula)
print(good_out)
# Model formula:
# Trt ~ W + X
#
# Fitted party:
# [1] root
# | [2] X >= 1.59791: 0.143 (n = 7, err = 0.9)
##### etc
documentation for model argument in rpart():
model:
if logical: keep a copy of the model frame in the result? If
the input value for model is a model frame (likely from an earlier
call to the rpart function), then this frame is used rather than
constructing new data.
Formulas can be tricky as they use lexical scoping and environments in a way that is not always natural (to me). Thank goodness Terry Therneau has made our lives easier with model=TRUE in these two packages!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Getting variables names from glmnet lasso into a data.frame - r

Related

Error in glmnet if I specify a variable to be a factor

R error: all arguments must have the same length

Error 'Non-numeric argument to mathematical function' when run ScheffeTest with aov object

Error using random forest (MICE package) during imputation

formula error inside function

Categories

Resources