How to load the WEKA pre-preprocessing steps to R? - r
I have used the WEKA GUI Java here to do the preprocessing of the data. I would like to use the same preprocessing steps now in R.
For example, I want to load the preprocessing of MultiFilter of WEKA GUI to R. I cannot find it in RWeka.
How to load the WEKA prepreprocessing steps to R?
You can load WEKA GUI steps partially with RWeka or with Weka command line tools that are are far more extensive than the available functions in RWeka. So you can extend the RWeka with the command line commands through the system command in R. Luckily, the parameters in WEKA GUI and the WEKA commandline are the same. I recommend extracting the weka-src.jar with jar xf weka-src.jar to read the source.
There exist many functions for the MultiFilter
java weka.filters.MultiFilter --help
java weka.filters.unsupervised.attribute.PartitionedMultiFilter --help
where the second allows you specify the attribute range. Otherwise, they seem to be identical.
Then you can run your first discretize filter with
java weka.filters.unsupervised.attribute.Discretize -F -B 20 -M -1.0 -R 27 -i yourFile.arff
and then direct its output to next Discretize, eventually to NumericTransform and Resample. The command line provides fabulous instructions on the commands in the following way
java weka.filters.unsupervised.attribute.NumericTransform --help
java weka.filters.unsupervised.attribute.Remove --help
java weka.filters.unsupervised.instance.Resample --help
java weka.filters.supervised.instance.Resample --help
and you can check them from the directory structure or the index.
RWeka
RWeka package provides the functions
Discretize()
Normalize()
make_Weka_filter() to create R interfaces to Weka filters
and there is no NumericTransform and Remove functions. You need to use their arguments so not directly just by copy-pasting a java code from WEKA GUI. Perhaps, one solution could be use the system command and execute the Java code with it, without having to need to learn the RWeka itself. There seems to be some gap between the WEKA GUI and the R package.
Running Weka on Commandline
Even though the commands are missing through RWeka interface, you can also use the system commands in R. For example, you can run the remove command
java weka.filters.unsupervised.attribute.Remove -i yourfile.arff
such that
system("java weka.filters.unsupervised.attribute.Remove -i yourfile.arff")
I have the following setup here so we can run Discretize with the following way.
$ cat $WEKAINSTALL/data/iris.arff |tail
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
%
%
$ java weka.filters.unsupervised.attribute.Discretize -i $WEKAINSTALL/data/iris.arff |tail
'\'(6.46-6.82]\'','\'(2.96-3.2]\'','\'(5.13-5.72]\'','\'(2.26-inf)\'',Iris-virginica
'\'(6.82-7.18]\'','\'(2.96-3.2]\'','\'(4.54-5.13]\'','\'(2.26-inf)\'',Iris-virginica
'\'(5.74-6.1]\'','\'(2.48-2.72]\'','\'(4.54-5.13]\'','\'(1.78-2.02]\'',Iris-virginica
'\'(6.46-6.82]\'','\'(2.96-3.2]\'','\'(5.72-6.31]\'','\'(2.26-inf)\'',Iris-virginica
'\'(6.46-6.82]\'','\'(3.2-3.44]\'','\'(5.13-5.72]\'','\'(2.26-inf)\'',Iris-virginica
'\'(6.46-6.82]\'','\'(2.96-3.2]\'','\'(5.13-5.72]\'','\'(2.26-inf)\'',Iris-virginica
'\'(6.1-6.46]\'','\'(2.48-2.72]\'','\'(4.54-5.13]\'','\'(1.78-2.02]\'',Iris-virginica
'\'(6.46-6.82]\'','\'(2.96-3.2]\'','\'(5.13-5.72]\'','\'(1.78-2.02]\'',Iris-virginica
'\'(6.1-6.46]\'','\'(3.2-3.44]\'','\'(5.13-5.72]\'','\'(2.26-inf)\'',Iris-virginica
'\'(5.74-6.1]\'','\'(2.96-3.2]\'','\'(4.54-5.13]\'','\'(1.78-2.02]\'',Iris-virginica
$
Some useful information
Use Weka in your Java code
Download the Linux Developer version, unzip it and read the README with many fabulous examples about using WEKA particularly on command line.
Wiki here
Maybe irrelevant: Generating source code from WEKA classes
Related
Run R command without entering R and without a script
I want to run an R command from command line (actually, from within a Makefile). The command is roxygen2::roxygenise(), if it is relevant. I don't want to create a new file and run that as a script - that will just clutter my directory. In python, this is simple - you write python -c "import antigravity". I use the Makefile to build, install and test a (Rcpp) package I'm working on.
This is generally done with so 'shebang scripts'. Historically, littler was there first, about a decade or so ago. It is still widely used, and contains a number of helper scripts as for example roxy.r which does just what you desire: run roxygen2::roxygenize(). I use this all the time. Next, Rscript started to ship with R. It is similar to littler but automatically available whereever R is which is a plus. On the minus side, it starts slower, and fails to load the methods package which is a source of a number of bug reports and SO questions. Much more recently, R itself added the ability to run expressions following the -e ... switch. So you have plenty of choices. You can also study plenty of src/Makevars files many of which use Rscript.
How to install jvmr package on databricks
I want to call R function in scala script on databricks. Is there anyway that we can do it? I use JVMR_JAR=$(R --slave -e 'library("jvmr"); cat(.jvmr.jar)') scalac -cp "$JVMR_JAR" scala -cp ".:$JVMR_JAR" on my mac and it automatically open a scala which can call R functions. Is there any way I can do similar stuff on databricks?
On the DataBricks Cloud, you can use the sbt-databricks to deploy external libraries to the cloud and attach them to specific clusters, which are two necessary steps to make sure jvmr is available to the machines you're calling this on. See the plugin's github README and the blog post. If those resources don't suffice, perhaps you should ask your questions to Databricks' support.
If you want to call an R function in the scala notebook, you can use the %r shortcut. df.registerTempTable("temp_table_scores") Create a new cell, then use: %r scores <- table(sqlContext, "temp_table_scores") local_df <- collect(scores) someFunc(local_df) If you want to pass the data back into the environment, you can save it to S3 or register it as a temporary table.
How can I print R documentation from a Linux command shell (e.g. bash)?
How can I check documentation for R code from a Linux command shell such as bash? I DO NOT mean an interactive session. With Perl, I can use perldoc to print out documentation at the command line: perldoc lib I was hoping for something simple like that for R. I don't always want to pull up a full interactive R session just to look up some documentation.
There might be other ways, but one that works for me is using the -e flag to execute code on the command line. I also use the --slave flag, which prevents anything from being printed to standard output (e.g. no R startup messages, etc.): R --slave -e '?function' I actually created a super small script I call rdoc to act like a simple R version of perldoc: #!/bin/bash R --slave -e "?$1" After installing that in my ~/bin directory (or however you install it in your PATH), it's easy: rdoc function If you want to look at documentation of a function from a particular package, prepend the library name followed by two colons. For example, to pull up documentation of the dmrFinder function from the charm package: rdoc charm::dmrFinder
Is there an approach for distributing R command-line scripts with an R package? [duplicate]
I am interested in providing a command line interface to an R package called Slidify that I am authoring. It uses Rscript and I think that would make it cross-platform. The scripts are stored in the subdirectory inst/slidify. In order to use the script from any directory, I added its path to my .bash_profile as I am on a Mac. My question is How should I handle installation of the script in an automated cross-platform way? How can I make sure that the file permissions are retained in this process? What should the shebang line for the script be? I am currently using #!/usr/bin/Rscript --vanilla --slave I would appreciate pointers on how to handle this and any examples of R packages that already do it. Just to make sure, I am clear on how this would work, a user would be able to generate a slide deck from slides.Rmd by just running slidify generate slides.Rmd from the command line. UPDATE: Here is how I install it on a Mac from the command line. I use the excellent sub library by 37 signals to create the scripts. echo "$(path/to/clidir/slidify init -)" >> ~/.bash_profile exec bash Two follow up questions Can I package these commands into an R function install_slidify_cli? How can I mirror these commands for Windows users?
Lovin' slidify so would be glad to help. But in short, you can't. R packages simply cannot install outside of $R_HOME or the chosen library folder. Ship the script in the package, and tell users to copy it. If there was a better way, out littler package with predecessor / alternative to Rscript would long have used it, and roxygen / roxygen2 would also have shipped something.
Including Command Line Scripts with an R Package
I am interested in providing a command line interface to an R package called Slidify that I am authoring. It uses Rscript and I think that would make it cross-platform. The scripts are stored in the subdirectory inst/slidify. In order to use the script from any directory, I added its path to my .bash_profile as I am on a Mac. My question is How should I handle installation of the script in an automated cross-platform way? How can I make sure that the file permissions are retained in this process? What should the shebang line for the script be? I am currently using #!/usr/bin/Rscript --vanilla --slave I would appreciate pointers on how to handle this and any examples of R packages that already do it. Just to make sure, I am clear on how this would work, a user would be able to generate a slide deck from slides.Rmd by just running slidify generate slides.Rmd from the command line. UPDATE: Here is how I install it on a Mac from the command line. I use the excellent sub library by 37 signals to create the scripts. echo "$(path/to/clidir/slidify init -)" >> ~/.bash_profile exec bash Two follow up questions Can I package these commands into an R function install_slidify_cli? How can I mirror these commands for Windows users?
Lovin' slidify so would be glad to help. But in short, you can't. R packages simply cannot install outside of $R_HOME or the chosen library folder. Ship the script in the package, and tell users to copy it. If there was a better way, out littler package with predecessor / alternative to Rscript would long have used it, and roxygen / roxygen2 would also have shipped something.