R PMML class distribution - r

While trying to export an R classifier to PMML, using the pmml package, I noticed that the class distribution for a node in the tree is not exported.
PMML supports this with the ScoreDistribution element: http://www.dmg.org/v1-1/treemodel.html
Is there anyway to have this information in the PMML? I want to read the PMML with another tool that depends on this information.
I'm doing something like:
library(randomForest)
library(pmml)
iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,proximity=TRUE)
pmml(iris.rf)

Can you provide some more information..such as, which function you are trying to use.
For example, if you are using the randomForest package, I believe it doesn't provide information about the score distribution; so neither can the PMML representation. However, if you are using the default values, the parameter 'nodesize' for classification ceses, for example, equals 1 and that means the terminal node will have a ScoreDistribution such as:
ScoreDistribution value=predictedValue probability="1.0"/>
ScoreDistribution value=AnyOtherTargetCategoty probability="0.0"/>
If you are using the rpart tree model, the pmml function does output the score distribution information. Perhaps you can give us the exact commands you used?

Related

Specify an object to model a random effect in mixed model

I would like to run a mixed model in R specifycing the object for the structure of a random effect.
My model is this:
model1=lme(methane~fixedfactor1,
random=(~1|factory),
data=df,method="REML")
I have then an object named "factory_relationship" that I would like to use to model the structure of the random effect "factory".
If it can help, I did it in SAS by using the following:
proc mixed data=methane_data NOINFO;
class fixedfactor1;
model methane= fixedfactor1;
random factory/type=lin(1) LDATA=factory_relationship;
run;
However, I could not find any solutions in R.
Could you please help me?
Best
I tried to read the PDF guidelines for both "nlme" and "lme4" R packages, but I could not find any hint.

R save xgb model command error: 'model must be xgb.Booster'

'bst' is the name of an xgboost model that I built in R. It gives me predicted values for the test dataset using this code. So it is definitely an xgboost model.
pred.xgb <- predict(bst , xdtest) # get prediction in test sample
cor(ytestdata, pred.xgb)
Now, I would like to save the model so another can use the model with their data set which has the same predictor variables and the same variable to be predicted.
Consistent with page 4 of xgboost.pdf, the documentation for the xgboost package, I use the xgb.save command:
xgb.save(bst, 'xgb.model')
which produces the error:
Error in xgb.save(bst, "xgb.model") : model must be xgb.Booster.
Any insight would be appreciated. I searched the stack overflow and could not locate relevant advice.
Mike
It's hard to know exactly what's going on without a fully reproducible example. But just because your model can make predictions on the test data, it doesn't mean it's an xgboost model. It can be any type of model with a predict method.
You can try class(bst) to see the class of your bst object. It should return "xgb.Booster," though I suspect it won't here (hence your error).
On another note, if you want to pass your model to another person using R, you can just save the R object rather than exporting to binary, via:
save(bst, model.RData)

Obtaining the Linear Regression Model at each Leaf for M5P model

I am trying to figure how to get the linear model at each leaf of a tree generated by M5P method in RWeka library in R as an output to text file so that I can write a separate look up calculator program (say in Excel for non-R Users).
I am using
library (RWeka)
model = M5P (response ~ predictorA+predictorB, data=train).
I can get the tree output as model$classifier in a matrix. This works great thanks to This post
If I give the command:
model
R prints the model$classifier (the tree structure), followed by the LM at each leaf, I want to extract the coefficients of LM at each leaf.
Using the following code: I am able to get the LM coefficients out of R.
library(rJava)
ModelTree=as.matrix(scan(text=.jcall(model$classifier, "S","toString") ,sep="\n", what="") )[-c(1:2, 6), ,drop=FALSE]

How to make predictions using a pmml file in R

I created an xml file using pmml function from pmml library in R.
adamodel_iOS=ada(label~.,data=train_iOS, iter=ntrees, verbose=TRUE, loss="ada", bag.frac=0.7, nu=0.1, control=defctrl, type="real")
Ptrain_iOS = predict(adamodel_iOS,newdata=train_iOS, type="prob")
library(pmml)
adapmml_iOS=pmml(adamodel_iOS)
saveXML(adapmml_iOS,"model_iOS.xml")
save.image()
After, training model in the first line, I found the corresponding probabilities for the training data.
Now I want to use this xml file to generate predictions on a set of data(basically the training set again). How do I do that in R? I see that in java and spark, we can load xml file generated by pmml function and then there are functions which can make predictions.
Basically, I am looking for a function in R that can take this xml file as an input and then return an object which in turn takes some datapoints as input and return their probabilities of having label 0 and 1.
I found a link:
Can PMML models be read in R?
but it does not help
Check this link for the list of PMML producers and consumers. As you can see R is listed as producer not consumer. Also, algorithms for which R can produce the corresponding PMML files are listed.
The most comprehensive tool for PMML validator, convertor, and also for scoring data using PMML models is ADAPA, which is not free.
KNIME is an open source drag & drop analytics tool which supports both import and export of PMML files (not for all models and the features are limited.) It supports R, Python, and Java too.
Although it's a long time ago, I still want to share that you can use the "reticulate" to call the python pypmml package to implement your ideas in R, and in order to be more friendly and make the prediction look more like the predict function in R, I will It is encapsulated, the address of the package is here enter link description here

R randomForest to PMML class index is wrong

I'm exporting an R randomForest model to PMML. The resulting PMML always has the class as the first element of the DataDictionary element, which is not always true.
Is there some way to fix this or at least increment the PMML with custom Extension elements? That way I could put the class index there.
I've looked in the pmml package documentation, as well as in the pmmlTransformations packages, but couldn't find anything there that could help me solve this issue.
By PMML class I assume you mean the model type (classification vs regression) in the PMML model attributes?
If so, it is not true that the model type is determined from the data type of the first element of the DataDictionary....these are completely independent. The model type is determined from the model type R thinks it is. The R random forest object determines the type it thinks it is (model$type) and that is the model type exported by the pmml function. If you want your model to be a certain type, just make sure you let R know that...for example, if you are using the iris data set, if your predicted variable is Sepal.Length, R will correctly assume it is a regression model. If you insist on treating it as a classification model, try using as.factor(Sepal.Length) instead.

Resources