I want to generate a decision tree where i can use whatever splitting variable I want to use initially and after a particular node.I want to change the rules as well. Is there any package in R which can do this or does anyone know any open source software which can help me do this ?
Related
I am trying to translate some code that we previously used in a software similar to PowerBI into some form that's compatible with PowerBI. One thing that I need to do for that is to generate a model fit to some data and use that to display some data on the fit (in some further visual elements).
From a sequential point of view, this is trivial. Generate an object, then work on that object and print some data. But from what I understand about PowerBI, this kind of interdependency between R scripts / visual elements (generate an object, then hand that object to other procedures to generate further output) is not intended and since I need to use several visual elements, and all of them depend on the output of the first, I have no idea how to work this out.
I need to use several visual elements, and all of them depend on the output of the first
Then the data needs to be created in Power Query and loaded into the data model. You can run R in Power Query to generate the data, and visualize it with regular Power BI Visuals and the R Visual.
In GBM model, following parameters are used -
col_sample_rate
col_sample_rate_per_tree
col_sample_rate_change_per_level
I understand how the sampling works and how many variables get considered for splitting at each level for every tree. I am trying to understand how many times each feature gets considered for making a decision. Is there a way to easily extract all sample of features used for making a splitting decision from the model object?
Referring to the explanation provided by H2O, http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/col_sample_rate.html, is there a way to know 60 randomly chosen features for each split?
Thank you for your help!
If you want to see which features were used at a given split in a give tree you can navigate the H2OTree object.
For R see documentation here and here
For Python see documentation here
You can also take a look at this Blog (if this link ever dies just do a google search for H2OTree class)
I don’t know if I would call this easy, but the MOJO tree visualizer spits out a graphviz dot data file which is turned into a visualization. This has the information you are interested in.
http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/overview-summary.html#viewing-a-mojo
I am pretty new to the RNetLogo package in R, but so far all the examples of using RNetLogo I saw were about loading model samples and doing something with them. I did not see any examples which show that we can create our own model and write down rules according to which our agents will interact with each other (or see the code of sample models and change it). Is it possible to write these rules in R or does RNetLogo allow us to play with already implemented models (samples) only without changing the code?
For example, when we open in NetLogo Models Library-->Earth Science-->Climate Change (just random example) then we can go to the Code tab and see the code written in NetLogo prog.language:
globals [
sky-top ;; y coordinate of top row of sky
...
My question is: can we see this code in R and change it?
My answer, I do not think so :-) . You need to developpe your model in netlogo and with RNetlogo in R you can run it, play with data, send some data.frame to your model, change some variable.
I am trying to get comfortable with the 'rattle' package in R. I am having issues building a neural network using this package.
I have a training data set of 140 columns and 200000 rows and a target variable that takes values from 0-4 depending on the class it belongs to. It is a classic pattern classification problem.
When I load my data into rattle, the option of 'neural network' under 'Model' tab is de-activated. Is there a pre-requisite that my data doesn't fulfil?
I know I can use neural network specific packages to implement one, but the situation requires me to use rattle.
Any clues/suggestions are very much appreciated.
Thanks in advance!
Make sure that your dataset contains all neumeric values. Else go to data tab and select 'ignore' to ignore that feature for further calculation.
Check this link if you want to do the same without using the GUI
http://www.r-bloggers.com/visualizing-neural-networks-in-r-update/
I have been using decision trees (CART) in R using the rpart package to look at the relationship between SST (predictor variables) and climate (predictand variable).
I would like to "force" the tree into a particular structure - i.e. split on predictor variable 1, then on variable 2.
I've been using R for a while so I thought I'd be able to look at the code behind the rpart function and modify it to search for 'best splits' in a particular predictor variable first. However the rpart function calls C routines and not having any experience with C I get lost here...
I could write a function from scratch but would like to avoid it if possible! So my questions are:
Is there another decision tree technique (implemented in R
preferably) in which you can force the structure of the tree?
If not - is there some way I could convert the C code to R?
Any other ideas?
Thanks in advance, and help is much appreciated.
When your data indicates a tree with a known structure, present that structure to R using either a newick or nexus file format. Then you can read in the structure using either read.tree or read.nexus from Package Phylo.
Maybe you should look at the method formal parameter of rpart
In the documentation :
... ‘method’ can be a list of functions named ‘init’, ‘split’ and ‘eval’. Examples are given in the file ‘tests/usersplits.R’ in the sources.