Using Rattle package on R - r

I am trying to get comfortable with the 'rattle' package in R. I am having issues building a neural network using this package.
I have a training data set of 140 columns and 200000 rows and a target variable that takes values from 0-4 depending on the class it belongs to. It is a classic pattern classification problem.
When I load my data into rattle, the option of 'neural network' under 'Model' tab is de-activated. Is there a pre-requisite that my data doesn't fulfil?
I know I can use neural network specific packages to implement one, but the situation requires me to use rattle.
Any clues/suggestions are very much appreciated.
Thanks in advance!

Make sure that your dataset contains all neumeric values. Else go to data tab and select 'ignore' to ignore that feature for further calculation.
Check this link if you want to do the same without using the GUI
http://www.r-bloggers.com/visualizing-neural-networks-in-r-update/

Related

How to use weight with the package "crsosstable" for R

The crosstable package give me exactly what I need to do some exploratory work in a data set composed of answers to a survey. But I need to weight the crosstabulation to get a representative results of the population I'm studying. Any ideas how I could use weights with this package?
So far I have used the "survey" package to do that, but it's lacking presentation tool to get publication ready tables.
Thanks.
I'm the dev of the crosstable package and it is unfortunately not supporting weights yet.
I would love to implement this as a feature one day, so you should definitely open a Feature Request on GitHub.
As I've never had to do a weighted description myself, please add a simplified version of your use case so that I can make something useful to everyone.

Subset of features on external memory

I have a large file that I'm not able to load so I'm using a local file with xgb.DMatrix. But I'd like to use only a subset of the features. The documentation on xgboost says that the colset argument on slice is "currently not used" and there is no metion of this feature in the github page. And I haven't found any other clue of how to do column subsetting with external memory.
I wish to compare models generated with different features subsettings. The only thing I could think of is to create a new file with the features that I want to use but it's taking a long time and will take a lot of memory... I can't help wondering if there is a better way.
ps.: I tried using h2o package too but h2o.importFile froze.

Customized decision tree splits and nodes using R

I want to generate a decision tree where i can use whatever splitting variable I want to use initially and after a particular node.I want to change the rules as well. Is there any package in R which can do this or does anyone know any open source software which can help me do this ?

Comparison of good vs bad dataset using R

Stuck in a problem. There are two datasets A and B. Say they're datasets of two factories. Factory A is performing really well whereas Factory B is not. I have the data-set of Factory A (data being output from the manufacturing units) as well as Factory B, both having the same variables. How can I identify the problematic variable in Factory B which needs to be fixed so that Factory B starts performing well too? Therefore, I need to identify the problematic variable which needs immediate attention.
Looking forward to your response.
p.s: coding language being used is R
Well this is shameless plug for the dataMaid package which I helped write and which sort of does what you are asking. The idea of the dataMaid package is to run a battery of tests on the variables in a data frame and produce a report that a human investigator (preferably someone with knowledge about the context) can look through in order to identify potential problems.
A super simple way to get started is to load the package and use the
clean function on a data frame (if you try to clean the same data
frame several times then it may be necessary to add the replace=TRUE
argument to overwrite the existing report).
devtools::install_github("ekstroem/dataMaid")
library(dataMaid)
data(trees)
clean(trees)
This will create a report with summaries and error checks for each
variable in the trees data frame. A summary of all the variables is provided and for the trees data it looks like this
while the information from each variable may look like this
Here we get a status about the variable type, summary statistics, a plot and - in this case - an indicator that there might be a problem with outliers.
The dataMaid package can also be used interactively by running checks for the individual variables or for all variables in the dataset
data(toyData)
check(toyData$var2) # Individual check of var2
check(toyData) # Check all variables at once
By default the standard battery of tests is run depending on the
variable type, but it is possible to extend the package by providing your own checks.
In your case I would run the package on both datasets to get two reports, and any major differences in those would raise a flag about what could be problematic.

How to use DWD R package in order to remove biases and merge two microarray datasets

I am trying to find a way to use distance weighted discrimination method (DWD) to remove biases from multiple microarray datasets.
My starting point is this. The problem is that Matlab version runs only under Windows, needs excel 5 format as input (where data appears to be truncated at line 65535 - matlab error is:
Error reading record for cells starting at column 65535. Try saving as Excel 98.
). Java version runs only with caBIG support, which, if I understood, has been shut down recently.
So I searched a lot and I find R/DWD package but from example I could not get how to provide the two datasets to merge to kdwd function.
Does anybody know how to use it?
Thanks
Try this, it has a DWD implementation
http://www.bioconductor.org/packages/release/bioc/html/inSilicoMerging.html

Resources