Not finding 'weightedMean' object for numFun using kNN in VIM package for R - r

I'm getting an error stating that the 'weightedMean' argument is not found for the 'numFun' parameter in the kNN imputation function within the VIM R package. I'm attempting to impute data in a fairly large dataset, and I want to use kNN with 5 neighbors using weighted means.
Here is my code:
df.imputed <- kNN(df, variable = c(...), dist_var = c(...), numFun = weightedMean, k = 5, weightDist = TRUE, trace = TRUE, imp_var = TRUE)
And the exact error is:
Error in args(numFun) : object 'weightedMean' not found
Based on the documentation (https://cran.r-project.org/web/packages/VIM/VIM.pdf page 29) it seems like this should work.

Try put this way numFun = weighted.mean,
at least worked for me

Related

Extracting the relative influence from a gbm.fit object

I am trying to extract the relative influence of each variable from a gbm.fit object but it is coming up with the error below:
> summary(boost_cox, plotit = FALSE)
Error in data.frame(var = object$var.names[i], rel.inf = rel.inf[i]) :
row names contain missing values
The boost_cox object itself is fitted as follows:
boost_cox = gbm.fit(x = x,
y = y,
distribution="coxph",
verbose = FALSE,
keep.data = TRUE)
I have to use the gbm.fit function rather than the standard gbm function due to the large number of predictors (26k+)
I have solve this issue now myself.
The relative.influence() function can be used and works for objects created using both gbm() and gbm.fit(). However, it does not provide the plots as in the summary() function.
I hope this helps anyone else looking in the future.

Unused argument error when building a Confusion Matrix in R

I am currently trying to run Logistic Regression model on my DF.
While I was creating a new modelframe with the actual and predicted values i get get the following error message.
Error
Error in confusionMatrix(as.factor(log_class), lgtest$Satisfaction, positive = "satisfied") :
unused argument (positive = "satisfied")
This is my model:
#### Logistic regression model
log_model = glm(Satisfaction~., data = lgtrain, family = "binomial")
summary(log_model)
log_preds = predict(log_model, lgtest[,1:22], type = "response")
head(log_preds)
log_class = array(c(99))
for (i in 1:length(log_preds)){
if(log_preds[i]>0.5){
log_class[i]="satisfied"}else{log_class[i]="neutral or dissatisfied"}}
### Creating a new modelframe containing the actual and predicted values.
log_result = data.frame(Actual = lgtest$Satisfaction, Prediction = log_class)
lgtest$Satisfaction = factor(lgtest$Satisfaction, c(1,0),labels=c("satisfied","neutral or dissatisfied"))
lgtest
confusionMatrix(log_class, log_preds, threshold = 0.5) ####this works
mr1 = confusionMatrix(as.factor(log_class),lgtest$Satisfaction, positive = "satisfied") ## this is the line that causes the error
I had same problem. I typed "?confusionMatrix" and take this output:
Help on topic 'confusionMatrix' was found in the following packages:
confusionMatrix
(in package InformationValue in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Create a confusion matrix
(in package caret in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Confusion Matrix
(in package ModelMetrics in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
As we can understand from here, since it is in more than one package, we need to specify which package we want to use.
So I typed code with "caret::confusionMatrix(...)" and it worked!
This is how we can write the code to get rid of argument error when building a confusion matrix in R
caret::confusionMatrix(
data = new_tree_predict$predicted,
reference = new_tree_predict$actual,
positive = "True"
)

Error in eval(parse()) - r unable to find argument input

I am very new to R, and this is my first time of encountering the eval() function. So I am trying to use the med and boot.med function from the following package: mma. I am using it to conduct mediation analysis. med and boot.med take in models such as linear models, and dataframes that specify mediators and predictors and then estimate the mediation effect of each mediator.
The author of the package gives the flexible option of specifying one's own custom.function. From the source code of med, it can be seen that the custom.function is passed to the eval(). So I tried insert the gbmt function as the custom function. However, R kept giving me error message: Error during wrapup: Number of trees to be used in prediction must be provided. I have been searching online for days and tried many ways of specifying the number of trees parameter n.trees, but nothing works (I believe others have raised similar issues: post 1, post 2).
The following codes are part of the source code of the med function:
cf1 = gsub("responseY", "y[,j]", custom.function[j])
cf1 = gsub("dataset123", "x2", cf1)
cf1 = gsub("weights123", "w", cf1)
full.model[[j]] <- eval(parse(text = cf1))
One custom function example the author gives in the package documentation is as follows:
temp1<-med(data=data.bin,n=2,custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
Here the glm is the custom function. This example code works and you can replicate it easily (if you have mma installed and loaded). However when I am trying to use the gbmt function on a survival object, I got errors and here is what my code looks like:
temp1 <- med(data = data.surv,n=2,type = "link",
custom.function = 'gbmt(responseY ~.,
data = dataset123,
distribution = dist,
train_params = start_stop,
cv_folds=10,
keep_gbm_data = TRUE,
)')
Anyone has any idea how the argument about number of trees n.trees can be added somewhere in the above code?
Many thanks in advance!
Update: in order to replicate the example code, please install mma and try the following:
library("mma")
data("weight_behavior") ##binary x #binary y
x=weight_behavior[,c(2,4:14)]
pred=weight_behavior[,3]
y=weight_behavior[,15]
data.bin<-data.org(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10), binref=c(1,1),catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4)
temp1<-med(data=data.bin,n=2) #or use self-defined final function
temp1<-med(data=data.bin,n=2, custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
I changed the custom.function to gbmt and used a survival object as responseY and the error occurs. When I use the gbmt function on my data outside the med function, there is no error.

Grid tuning xgboost with missing data

It seems like the expected method of grid tuning the xgboost model is using the caret package, as clearly displayed here: https://stats.stackexchange.com/questions/171043/how-to-tune-hyperparameters-of-xgboost-trees
However, I struggle to make sense of the case with missing data. When creating the model without using caret, I set the missing to NA.
dtrain = xgb.DMatrix(data = data.matrix(train$data),label = train$label,missing = NA)
That allows me to create the model like so:
bst = xgboost(data = dtrain,depth = 4,eta =.3,nthread = 2,
nround = 43, print.every.n = 5,
objective = "binary:logistic",eval_metric = "auc",verbose = TRUE
)
This works very nicely, however, caret does not take this kind of object.
This is what I'm trying:
xgbtrain = train(x = train$data,y = as.factor(make.names(train$label)),
trControl = trControl, tuneGrid = my_grid,method = "xgbTree")
But for every iteration it is telling me this:
Error in xgb.DMatrix(as.matrix(x), label = y) : can not open file "NA"
That's the same error message I was getting before in regular xgb.boost when I didn't set my missing to NA. The xgb.DMatrix is not a subsettable object I could take the data from, and it is also not possible to convert it to a data frame. How do I get around this?
EDIT
Figured it out. In the end, it had nothing to do with missing data, but with having factors in the dataset. Instead of using xgboost's function to convert to a sparse matrix, I used regular model.matrix() and was able to successfully plug in the new matrix into caret's train function.

Error message not understood

I am trying to compute a function in R but, i am getting a weird error message which is not giving me any hint on what might be wrong...
G2M1$data is just a matrix containing data.
library(klaR)
out <- NaiveBayes(x = G2M1$data, grouping = G2M1$labels, usekernel = TRUE, density(G2M1$data, bw = "nrd0", adjust = 1,kernel = "gaussian"))
error message:
Error in sum(prior) : invalid 'type' (list) of argument
I am not sure why, since i am not defining any prior?
The first step in asking a question on Stack Overflow is to create a reproducible example. That is a small example that users can input into their computers to test, diagnose, and solve your issue. It not only helps others but it also enables you to properly assess your problem and potentially find a solution while creating the example.
Example
G2M1 <- list(data=as.matrix(iris[-5]), labels=iris[[5]])
This is an example dataset in the same structure and name as your question using the iris dataset.
Recreate error
Let's run your expression as is to see the error:
library(klaR)
out <- NaiveBayes(x = G2M1$data, grouping = G2M1$labels, usekernel = TRUE, density(G2M1$data, bw = "nrd0", adjust = 1,kernel = "gaussian"))
#Error in sum(prior) : invalid 'type' (list) of argument
Now we have found the error with our example. Let's investigate why it's happening. Let's look at the density expression and save it to a variable:
den <- density(G2M1$data, bw = "nrd0", adjust = 1,kernel = "gaussian")
class(den)
#[1] "density"
typeof(den)
#[1] "list"
It is a list. It not only has the densities, it also has other information like the call used and coordinates that we do not need for the model. Where are the densities themselves? We look to the documentation:
y the estimated density values.
Let's subset the variable with y to see the densities:
head(den$y)
#[1] 0.0003561307 0.0004076448 0.0004647614 0.0005300218 0.0006043244 0.0006864581
This is what the model is looking for. We substitute den$y into the model call:
out <- NaiveBayes(x = G2M1$data, grouping = G2M1$labels, usekernel = TRUE, den$y)
Success. In the future, remember to create an example for everyone. And use these basic troubleshooting techniques. Good luck

Resources