'predict.pca' is not an exported object from 'namespace:mdatools' - r

I am kinda new in the R environment, and I am a beginner in programming. I've been using mdatools package to perform PCA analysis and predictions with this model, but I keep getting this error when trying to use predict function.
Error: 'predict.pca' is not an exported object from 'namespace:mdatools'
I already looked for "??predict.pca" to check if the function has been moved to another package, but it doesn't look like.
I truly appreciate any suggestions on how to fix this error.
Thank you.
This is part of code:
library(dplyr)
library(mdatools)
library(pca3d)
pca_trainset = trainset %>% select( -class )
pca_testset = testset
pca_car = pca( pca_trainset, scale = T )
str(pca_car)
plot(pca_car,show.labels = T)
train = data.frame( class = trainset$class, pca_car$calres$scores )
t = as.data.frame( mdatools::predict.pca( pca_car, pca_testset ) )

If we look carefully at the help for help(predict.pca):
predict.pca {mdatools}
PCA predictions
Description
Applies PCA model to a new data set.
Usage
## S3 method for class 'pca'
predict(object, x, ...)
Arguments
object a PCA model (object of class pca).
x data values (matrix or data frame).
... other arguments.
We will note that predict.pca is not a function, but rather an S3 method for predict on objects of class pca. Therefore, the appropriate way to use is with predict().
t = as.data.frame( predict( pca_car, pca_testset ) )
See here for more.

Related

Extracting the relative influence from a gbm.fit object

I am trying to extract the relative influence of each variable from a gbm.fit object but it is coming up with the error below:
> summary(boost_cox, plotit = FALSE)
Error in data.frame(var = object$var.names[i], rel.inf = rel.inf[i]) :
row names contain missing values
The boost_cox object itself is fitted as follows:
boost_cox = gbm.fit(x = x,
y = y,
distribution="coxph",
verbose = FALSE,
keep.data = TRUE)
I have to use the gbm.fit function rather than the standard gbm function due to the large number of predictors (26k+)
I have solve this issue now myself.
The relative.influence() function can be used and works for objects created using both gbm() and gbm.fit(). However, it does not provide the plots as in the summary() function.
I hope this helps anyone else looking in the future.

Caret train function for muliple data frames as function

there has been a similar question to mine 6 years+ ago and it hasn't been solve (R -- Can I apply the train function in caret to a list of data frames?)
This is why I am bringing up this topic again.
I'm writing my own functions for my big R project at the moment and I'm wondering if there is an opportunity to sum up the model training function train() of the pakage caret for different dataframes with different predictors.
My function should look like this:
lda_ex <- function(data, predictor){
model <- train(predictor ~., data,
method = "lda",
trControl = trainControl(method = "none"),
preProc = c("center","scale"))
return(model)
}
Using it afterwards should work like this:
data_iris <- iris
predictor_iris <- "Species"
iris_res <- lda_ex(data = data_iris, predictor = predictor_iris)
Unfortunately the R formula is not able to deal with a variable as input as far as I tried.
Is there something I am missing?
Thank you in advance for helping me out!
Solving this would help me A LOT to keep my function sheet clean and safe work for sure.
By writing predictor_iris <- "Species", you are basically saving a string object in predictor_iris. Thus, when you run lda_ex, I guess you incur in some error concerning the formula object in train(), since you are trying to predict a string using vectors of covariates.
Indeed, I tried the following toy example:
X = rnorm(1000)
Y = runif(1000)
predictor = "Y"
lm(predictor ~ X)
which gives an error about differences in the lengths of variables.
Let me modify your function:
lda_ex <- function(data, formula){
model <- train(formula, data,
method = "lda",
trControl = trainControl(method = "none"),
preProc = c("center","scale"))
return(model)
}
The key difference is that now we must pass in the whole formula, instead of the predictor only. In that way, we avoid the string-related problem.
library(caret) # Recall to specify the packages needed to reproduce your examples!
data_iris <- iris
formula_iris = Species ~ . # Key difference!
iris_res <- lda_ex(data = data_iris, formula = formula_iris)

Can't give a subset when using randomForest inside a function

I'm wanting to create a function that uses within it the randomForest function from the randomForest package. This takes the "subset" argument, which is a vector of row numbers of the data frame to use for training. However, if I use this argument when calling the randomForest function in another defined function, I get the error:
Error in eval(substitute(subset), data, env) :
object 'tr_subset' not found
Here is a reproducible example, where we attempt to train a random forest to classify a response "type" either "A" or "B", based on three numerical predictors:
library(randomForest)
# define a random data frame to train with
test.data = data.frame(
type = rep(NA, times = 500),
x = runif(500),
y = runif(500),
z = runif(500)
)
train.data$type[runif(500) >= 0.5] = "A"
train.data$type[is.na(test.data$type)] = "B"
train.data$type = as.factor(test.data$type)
# define the training range
training.range = sample(500)[1:300]
# formula to use
tr_form = formula(type ~ x + y + z)
# Function that includes the randomForest function
train_rf = function(form, all_data, tr_subset) {
p = randomForest(
formula = form,
data = all_data,
subset = tr_subset,
na.action = na.omit
)
return(p)
}
# test the new defined function
test_tree = train_rf(form = tr_form, all_data = train.data, tr_subset = training.range)
Running this gives the error:
Error in eval(substitute(subset), data, env) :
object 'tr_subset' not found
If, however, subset = tr_subset is removed from the randomForest function, and tr_subset is removed from the train_rf function, this code runs fine, however the whole data set is used for training!
It should be noted that using the subset argument in randomForest when not defined in another function works completely fine, and is the intended method for the function, as described in the vignette linked above.
I know in the mean time I could just define another training set that has just the row numbers required, and train using all of that, but is there a reason why my original code doesn't work please?
Thanks.
EDIT: I conjecture that, as subset() is a base R function, R is getting confused and thinking you're wanting to use the base R function rather than defining an argument of the randomForest function. I'm not an expert, though, so I may be wrong.

Create list of predefined S3 objects

I am busy with comparing different machine learning techniques in R. This is the case: I made several functions that, in an automated way are able to create each a different prediction model (e.g: logistic regression, random forest, neural network, hybrid ensemble , etc.) , predictions, confusion matrices, several statistics (e.g AUC and Fscore) ,and different plots.
I managed to create an S3 object that is able to store the required data.
However, when I try to create a list of my defined object, this fails and all data is stored sequentially in 1 big list.
This is my S3 object (as this is the first time that I create S3, I am not really sure that the code is a 100% correct):
modelObject <- function(modelName , modelObject, modelPredictions , rocCurve , aUC , confusionMatrix )
{
modelObject <- list(
model.name = modelName,
model.object = modelObject,
model.predictions = modelPredictions,
roc.curve = rocCurve,
roc.auc = aUC,
confusion.matrix = confusionMatrix
)
## Set the name for the class
class(modelObject) <- "modelObject"
return(modelObject)
}
at the end of each machine learning function, I define and return the object:
shortened example:
NeuralNetworkAnalysis<- function() {
#I removed the unnecessary code, as only the end of the code is relevant
nn.model <- modelObject(modelName = "Neural.Network" , modelObject = NN , modelPredictions = predNN , rocCurve = roc , aUC = auc , confusionMatrix = confu )
return(nn.model)
}
At last, in my 'script' function, I create an empty list and try to append the different objects
#function header and arguments before this part are irrelevant
# Build predictive model(s)
modelList = list("model" = modelObject)
modelList <- append(modelList , NeuralNetworkAnalysis())
modelList <- append(modelList, RandomForestAnalysis())
mod <<- RandomForestAnalysis() #this is to test what the outcome is when I do not put it in a list
return(modelList) } #end of the function ModelBuilding
models <- ModelBuilding( '01/01/2013' , '01/01/2014' , '02/01/2014' , '02/01/2015' )
Now, when I take a look at the models list, I don't have a list of objects, I just have a list with all the data of each algorithm.
class(models) [1] "list"
class(mod) [1] "modelObject"
How can i fix this problem, so that i can have a list that contains for example:
list$random.forest$variable.I.want.to.access (most favorable)
or
list[i]$variable.of.random.forest.that.I.want.to.access
thx in advance!
Olivier
Not sure if I understand correctly, but maybe the issue is only how your model list is built. If you try
modelList[["neural.network"]] <- NeuralNetworkAnalysis()
modelList[["random.forest"]] <- RandomForestAnalysis()
etc., does that give you the access methods you are looking for?

Write custom classifier in R and predict function

I would like to implement my own custom classifier in R, e.g., myClassifier(trainingSet, ...) which returns the learnt model m from a specified training set. I would like to call it just like any other classifier in r:
m <- myClassifier(trainingSet)
and then I want to overload (I don't know if this is the correct word) the generic function predict()
result <- predict(m, myNewData)
I have just a basic knowledge in R. I don't know which resources I should read in order to accomplish the desired task. In order for this to work, do I need to create a package?. I am looking for some initial directions.
Does the model m contain information about the overriden predict method?, or how does R knows which predict.* method corresponds to model m?
Here is some code that shows how to write a method for your own class for a generic function.
# create a function that returns an object of class myClassifierClass
myClassifier = function(trainingData, ...) {
model = structure(list(x = trainingData[, -1], y = trainingData[, 1]),
class = "myClassifierClass")
return(model)
}
# create a method for function print for class myClassifierClass
predict.myClassifierClass = function(modelObject) {
return(rlogis(length(modelObject$y)))
}
# test
mA = matrix(rnorm(100*10), nrow = 100, ncol = 10)
modelA = myClassifier(mA)
predict(modelA)

Resources