Create list of predefined S3 objects - r

I am busy with comparing different machine learning techniques in R. This is the case: I made several functions that, in an automated way are able to create each a different prediction model (e.g: logistic regression, random forest, neural network, hybrid ensemble , etc.) , predictions, confusion matrices, several statistics (e.g AUC and Fscore) ,and different plots.
I managed to create an S3 object that is able to store the required data.
However, when I try to create a list of my defined object, this fails and all data is stored sequentially in 1 big list.
This is my S3 object (as this is the first time that I create S3, I am not really sure that the code is a 100% correct):
modelObject <- function(modelName , modelObject, modelPredictions , rocCurve , aUC , confusionMatrix )
{
modelObject <- list(
model.name = modelName,
model.object = modelObject,
model.predictions = modelPredictions,
roc.curve = rocCurve,
roc.auc = aUC,
confusion.matrix = confusionMatrix
)
## Set the name for the class
class(modelObject) <- "modelObject"
return(modelObject)
}
at the end of each machine learning function, I define and return the object:
shortened example:
NeuralNetworkAnalysis<- function() {
#I removed the unnecessary code, as only the end of the code is relevant
nn.model <- modelObject(modelName = "Neural.Network" , modelObject = NN , modelPredictions = predNN , rocCurve = roc , aUC = auc , confusionMatrix = confu )
return(nn.model)
}
At last, in my 'script' function, I create an empty list and try to append the different objects
#function header and arguments before this part are irrelevant
# Build predictive model(s)
modelList = list("model" = modelObject)
modelList <- append(modelList , NeuralNetworkAnalysis())
modelList <- append(modelList, RandomForestAnalysis())
mod <<- RandomForestAnalysis() #this is to test what the outcome is when I do not put it in a list
return(modelList) } #end of the function ModelBuilding
models <- ModelBuilding( '01/01/2013' , '01/01/2014' , '02/01/2014' , '02/01/2015' )
Now, when I take a look at the models list, I don't have a list of objects, I just have a list with all the data of each algorithm.
class(models) [1] "list"
class(mod) [1] "modelObject"
How can i fix this problem, so that i can have a list that contains for example:
list$random.forest$variable.I.want.to.access (most favorable)
or
list[i]$variable.of.random.forest.that.I.want.to.access
thx in advance!
Olivier

Not sure if I understand correctly, but maybe the issue is only how your model list is built. If you try
modelList[["neural.network"]] <- NeuralNetworkAnalysis()
modelList[["random.forest"]] <- RandomForestAnalysis()
etc., does that give you the access methods you are looking for?

Related

Creating multiple GLM in a for loop, skipping models in the loop where the coefficients do not work in R

I have created a list that contains all possible combinations of my dependent variables and I am trying to create multiple glm with all of those combinations.
combinations = list()
models = list()
for(i in length(combinations) {
models[[i]]<- glm ( as.formula(paste("(x) ~", combinations[i])), family= 'Gamma' , data = df)
}
I get the error message:
Error: no valid set of coefficients has been found: please supply starting values
I know that is because one or a few models the combination of dependent variables seems to create some issues. How can I skip the models that create those issues (or leave them blank) and keep the loop going?
As a side comment: I tried implementing a normal exponential glm, which worked, but I would really like to stay with the gamma family.
combinations = list()
models = list()
for(i in length(combinations) {
models[[i]]<- glm ( as.formula(paste("exp(x) ~", combinations[i])), family= 'gaussian' , data = df)
}
Many thanks!

Bayesian Modelling in R

I am trying to implement a bayesian model in R using bas package with setting up these values for my Model:
databas <- bas.lm(at_areabuilding ~ ., data = dataCOMMA, method = "MCMC", prior = "ZS-null", modelprior = uniform())
I am trying to predict area of a given state with the help of certain area present for that particular state; but for different zip codes. My Model basically finds the various zip codes present in the data for a given state(using a state index for this) and then gives the output.
Now, Whenever I try to predict area of a state, I give this input:
> UT <- data.frame(zip = 84321, loc_st_prov_cd = "UT" ,state_idx = 7)
> predict_1 <- predict(databas,UT, estimator="BMA", interval = "predict", se.fit=TRUE)
> data.frame('state' = 'UT','estimated area' = predict_1$Ybma)
Now, I get the output for this state.
Suppose I have a list of states with given zip codes and I want to run my Model (databas) on that list and get the predictions, I cannot do it by using the above approach as it will take time. Is there any other way to do the same?
I did the same by the help of one gentleman and here is my code:
pred <- sapply(1:nrow(first), function(row) { predict(basdata,first[row, ],estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma })
basdata: My Model
first: my new dataset for which I am predicting area.
Now, The issue that i am facing is that the code is taking a long time to predict the values. It iterates over every row and calculates the area. There are 150000 rows in my dataset and I would request if anyone can help me optimizing the performance of this code.
Something like this will iterate over each row of your data frame of states, zips and indices (let's call it states_and_zips) and return a list of predictions. Each element of this list (which I've called pred) goes with the corresponding row of state_and_zips:
pred = lapply(1:nrow(states_and_zips), function(row) {
predict(databas, ~ states_and_zips[row, ],
estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma
})
If Ybma is a single value, then use sapply instead of lapply and it will return a vector of predictions, one for each row of state_and_zips that you can just add as a new column to states_and_zips.

'predict.pca' is not an exported object from 'namespace:mdatools'

I am kinda new in the R environment, and I am a beginner in programming. I've been using mdatools package to perform PCA analysis and predictions with this model, but I keep getting this error when trying to use predict function.
Error: 'predict.pca' is not an exported object from 'namespace:mdatools'
I already looked for "??predict.pca" to check if the function has been moved to another package, but it doesn't look like.
I truly appreciate any suggestions on how to fix this error.
Thank you.
This is part of code:
library(dplyr)
library(mdatools)
library(pca3d)
pca_trainset = trainset %>% select( -class )
pca_testset = testset
pca_car = pca( pca_trainset, scale = T )
str(pca_car)
plot(pca_car,show.labels = T)
train = data.frame( class = trainset$class, pca_car$calres$scores )
t = as.data.frame( mdatools::predict.pca( pca_car, pca_testset ) )
If we look carefully at the help for help(predict.pca):
predict.pca {mdatools}
PCA predictions
Description
Applies PCA model to a new data set.
Usage
## S3 method for class 'pca'
predict(object, x, ...)
Arguments
object a PCA model (object of class pca).
x data values (matrix or data frame).
... other arguments.
We will note that predict.pca is not a function, but rather an S3 method for predict on objects of class pca. Therefore, the appropriate way to use is with predict().
t = as.data.frame( predict( pca_car, pca_testset ) )
See here for more.

S4 object creation in R

I am busy with comparing different machine learning techniques in R.
This is the case: I made several functions that, in an automated way
are able to create each a different prediction model (e.g: logistic regression, random forest, neural network, hybrid ensemble , etc.) , predictions, confusion matrices, several statistics (e.g AUC and Fscore) ,and different plots.
Now I would like to create a list of S4 (or S3?) objects in R, where each object contains the model, predictions, the plots, confusion matrix , auc and fscore.
The idea is that each function creates such object and then append it to the object list in the return statement.
How should I program such class? And how can I define that each model can be of some different type (I suppose that all models that I create are S3 objects, so how do can I define this in my S4 class?
The end result should be able to do something like this: modelList[i]#plot should for example summon the requested plot. and names(modelList[i]) should give the name of the used model (if this is not possible, modelList[i]#name will do). Also, it should be possible to select the best model out of the list, based on a parameter, such as AUC.
I am not experienced in creating such object, so this is the code / idea I have at the moment:
modelObject <- setClass(
# Set the name for the class
"modelObject",
# Define the slots
slots = c(
modelName = "character"
model = #should contain a glm, neural network, random forest , etc model
predictions = #should contain a matrix or dataframe of custid and prediction
rocCurve = #when summoned, the ROC curve should be plotted
plotX = #when summoned, plot X should be plotted
AUC = "numeric" #contains the value of the AUC
confusionMatrix = "matrix" #prints the confusion matrix in the console
statX = "numeric"#contains statistic X about the confusion matrix e.g. Fscore
),
# Set the default values for the slots. (optional)
prototype=list(
# I guess i can assign NULL to each variable of the S4 object
),
# Make a function that can test to see if the data is consistent.
# This is not called if you have an initialize function defined!
validity=function(object)
{
#not really an idea how to handle this
}
return(TRUE)
}
)
Use setOldClass() to promote each S3 class to it's S4 equivalent
setOldClass("lm")
setOldClass(c("glm", "lm"))
setOldClass(c("nnet.formula", "nnet"))
setOldClass("xx")
Use setClassUnion() to insert a common base class in the hierarchy
setClassUnion("lmORnnetORxx", c("lm", "nnet", "xx"))
.ModelObject <- setClass("ModelObject", slots=c(model="lmORnnetORxx"))
setMethod("show", "ModelObject", function(object) {
cat("model class: ", class(object#model), "\n")
})
In action:
> library(nnet)
> x <- y <- 1:10
> .ModelObject(model=lm(x~y))
model class: lm
> .ModelObject(model=glm(x~y))
model class: glm lm
> .ModelObject(model=nnet(x~y, size=10, trace=FALSE))
model class: nnet.formula nnet
I think that you would also like to implement a Models object that contains a list where all elements are ModelObject; the constraint would be imposed by a validity method (see ?setValidity).
What I would do, is for each slot you want in your modelObject class, determine the range of expected values. For example, your model slot has to support all the possible classes of objects that can be returned by model training functions (e.g. lm(), glm(), nnet(), etc.). In the example case, you see the following objects returned:
```
x <- y <- 1:10
class(lm(x~y))
class(glm(x~y))
class(nnet(x~y, size=10))
```
Since there is no common class among the objects returned, it might make more sense to use an S3, which has less rigorous syntax and would allow you to assign various classes of output to the same field name. Your question is actually quite tough to answer, given that there are so many different approaches to take with R's myriad OO systems.

Write custom classifier in R and predict function

I would like to implement my own custom classifier in R, e.g., myClassifier(trainingSet, ...) which returns the learnt model m from a specified training set. I would like to call it just like any other classifier in r:
m <- myClassifier(trainingSet)
and then I want to overload (I don't know if this is the correct word) the generic function predict()
result <- predict(m, myNewData)
I have just a basic knowledge in R. I don't know which resources I should read in order to accomplish the desired task. In order for this to work, do I need to create a package?. I am looking for some initial directions.
Does the model m contain information about the overriden predict method?, or how does R knows which predict.* method corresponds to model m?
Here is some code that shows how to write a method for your own class for a generic function.
# create a function that returns an object of class myClassifierClass
myClassifier = function(trainingData, ...) {
model = structure(list(x = trainingData[, -1], y = trainingData[, 1]),
class = "myClassifierClass")
return(model)
}
# create a method for function print for class myClassifierClass
predict.myClassifierClass = function(modelObject) {
return(rlogis(length(modelObject$y)))
}
# test
mA = matrix(rnorm(100*10), nrow = 100, ncol = 10)
modelA = myClassifier(mA)
predict(modelA)

Resources