Convert party object to pmml - r

I am currently trying to convert a decision tree created using the R package partykit (party object) to a pmml-format. Are there packages that allow for this conversion? I am aware of the existance of the pmml-package, but this only supports rpart objects, created using the R package rpart. As I want to create decision trees myself and not retrieve them from a dataset, simply using rpart instead of partykit is not a solution.
Thank you for your suggestions,
Niels

At the moment the partykit package does not provide this feature. The package has many converters from other objects to party objects, including a converter from PMML. However, there are not many options to convert from party objects to other classes. A PMML export would certainly nice to have (at least for the special case of constparty objects) but so far I didn't look at what needs to be done for this.

Related

How should trained ML models be incorporated into a package that USES those trained models

I have been working on a ML project for which that work (done inside an R-project) resulted in some ML models (built with caret) ALONG WITH code that uses those models for additional analysis.
As the next phase, I am "deploying" these models by creating an R-package that my collaborators can use for analysis of new data, where that analysis includes USING the trained ML models. This package includes functions that generate reports, where, embedded in that report is the application of the trained ML models against the new data sets.
I am trying to identify the "right" way to include those trained models in the package. (Note, currently each model is saved in its own .rds file).
I want to be able to use those models inside of package functions.
I also want to consider the possibility of "updating" the models to a new version at a later date.
So ... should I:
Include the .rda files in inst/exdata
Include as part of sysdata.rda
Put them in an external data package (which seems reasonable, except almost all examples in tutorials expect a data package to
include data.frame-ish objects.)
With regard to that third option ... I note that these models likely imply that there are some additional "NAMESPACE" issues at play, as the models will require a whole bunch of caret related stuff to be useable. Is that NAMESPACE modification required to be in the "data" package or the package that I am building that will "use" the models?
My first intention is to go for 1. There is no need to go for other formats as PMML as you only want to run it within R. So I consider Rda as natively best. As long, as your models are not huge, it should be fine to share with collaborators (but maybe not for a CRAN package). I see, that 3. sounds convenient but why seperate models and functions? Freshly trained models then would come with a new package version, as you anayway would need to go with a data package. I dont see gaining much this way, but I have not much experiance with data packages.

Can we import the random forest model built using SparkR to R and then use getTree to extract one of the trees?

Like in decision tree we can see or visualize the node splits , I want to do something similar . But I am using SparkR and it does not have decision trees. So I am planning to use random forest with 1 tree as parameter and run on SparkR, then save the model and use getTree to see the node splits and further visualize using ggplot.
The short answer is no.
Models built with SparkR are not compatible with ones built with the respective R packages, in this case randomForest; hence, you will not be able to use the getTree function from the latter to visualize a tree from a random forest built with SparkR.
On a different level: I am surprised that decision trees have still not found their way into SparkR - they seem to be ready since several months now in the Github repo; but even when they are, they are not expected to offer methods for visualizing trees, and you will still not be able to use functions from other R packages for that purpose.

Does "gbm" package in R has basis functions other than decision tree?

if not, which packages implement multiple basis functions for boosting methods. Thanks a lot.
It doesn't look like it since you need to specify the number of trees in the gbm call. You can try the 'mboost' package to specify different types of base learners.

Do I have to export functions conditionally imported via :: in R?

I am extracting information from objects whose classes are defined in various R packages. For example, I extract coefficients from various statistical models (for which coef methods are not always implemented). I usually don't have to import those packages because I defined a generic function for which methods can be added by users. There is one method for each kind of statistical model, and it would be stupid to import all those model definitions if the user is only interested in one specific model type.
In some cases, however, I need to use a function which is defined in a package. For instance, the confint.merMod method in the lme4 package. Up to now, I used package::function to call these functions and wrapped this command in an exists(function) if-condition to make sure that the package really offers this function (because the function may be available only in some versions of the package).
However, I just discovered on http://developer.r-project.org/blosxom.cgi/R-devel/NEWS (see Sep 5, 2013) that in R version 3.0.2, "‘R CMD check’ does more thorough checking of declared packages and namespaces. It reports [...] objects imported by ‘::’ which are not exported."
Does this mean that I really have to add export("function") to the NAMESPACE file? Wouldn't the CMD check complain because the function is only imported conditionally?
To clarify/summarize for future visitors...
The R-devel News quoted in the question related to the now 3-0-branch NEWS file where the specific entry lists the particular cases which R CMD check will generate a report for issues found with imports, usage and declarations of packages and functions.
If you have questions regarding a particular warning the list is worth a look.
For further (and more in-depth) information see the r wiki page for links to the Writing R Extensions R development guide and other helpful information.

decision trees with forced structure

I have been using decision trees (CART) in R using the rpart package to look at the relationship between SST (predictor variables) and climate (predictand variable).
I would like to "force" the tree into a particular structure - i.e. split on predictor variable 1, then on variable 2.
I've been using R for a while so I thought I'd be able to look at the code behind the rpart function and modify it to search for 'best splits' in a particular predictor variable first. However the rpart function calls C routines and not having any experience with C I get lost here...
I could write a function from scratch but would like to avoid it if possible! So my questions are:
Is there another decision tree technique (implemented in R
preferably) in which you can force the structure of the tree?
If not - is there some way I could convert the C code to R?
Any other ideas?
Thanks in advance, and help is much appreciated.
When your data indicates a tree with a known structure, present that structure to R using either a newick or nexus file format. Then you can read in the structure using either read.tree or read.nexus from Package Phylo.
Maybe you should look at the method formal parameter of rpart
In the documentation :
... ‘method’ can be a list of functions named ‘init’, ‘split’ and ‘eval’. Examples are given in the file ‘tests/usersplits.R’ in the sources.

Resources