Eliminating the need for packages in base R? - r

I know one of the reasons R is so popular is because of its amazing packages. But for data security reasons, I can't install packages on my work computer. So, it got me thinking if I could still make R do what I would typically make it do using packages with just base R, since packages are, after all, a compiled list of functions. I am wondering if it is possible run regression models and make charts in base R (without using, say ggplot2, caret, etc.). Is it possible to copy the functions in these packages into base R to get the same functionality out of base R as one would if they were using the packages? Is the list of functions that are published as part of these packages available somewhere publicly by chance?

I am wondering if it is possible run regression models and make charts in base R (without using, say ggplot2, caret, etc.).
Yes, before ggplot2 was invented, R was genereally praised for publication ready graphics. R comes with great plotting capabilities without ggplot2 even though the latter is definitively an improvement.
Obviously, people used R for regression decades before caret was invented. A base R installation comes with a solid set of linear and nonlinear regression methods but obviously, all those packages (well, most of them) have a reason to exist. It will mainly depend on what you plan to do use. Many things are implemented in a base installation, many are not.
You can find lists of packages included with all binary distributions of R here: https://cran.r-project.org/doc/manuals/r-release/R-FAQ.html#Add_002don-packages-in-R
You will find, that that not only includes the stats package but lots of useful modelling packages like MASS, splines, boot, mgcv, nlme, cluster, rpart, spatial and survival, so a large number of even specialized models is at hand without additional downloading of packages.
Is it possible to copy the functions in these packages into base R to get the same functionality out of base R as one would if they were using the packages?
Many packages contain just plain R code, others will contain code in other languages, mostly C and C++, which will need a compiler to be translated on your system. However, where the use of foreign code / packages is considered a security breach, you should refrain from that and talk to your employer.
If it is not considered a problem but they do not want to make exceptions for you and your installation -- I was in the same place for quite some time and I just ran R from a USB stick. If that is allowed and feasible on your system, you can download packages to that USB stick installation.

Related

Difference between using "gr()" via Plots and installing GR.jl package

I have installed the Plots.jl package and using "gr()" command.
Then, I have come across GR.jl on https://gr-framework.org/julia.html
I am confused what is the difference between the two.
Could someone please cast insight into this?
Thanks,
GR.jl is a plotting package in Julia, actually a Julia wrapper to the whole GR framework, a really fast and powerful plotting framework with front- and backend capabilities. It is entirely useful and usable on it's own.
Plots.jl is meta-plotting package in Julia, which aims to provide a convenient terse syntax for creating plots with a number of different plotting packages. Plots thus does not do any plotting itself - it takes your input commands and translates them to calls to other plotting packages, called "backends". This is currently implemented for 5 different packages: PyPlot, GR, Plotly, PGFPlots and InspectDR. GR is by far the most widely used backend though (and currently the default).
A goal of Plots is to allow package owners to define "recipes", which are descriptions of how to plot a custom type (such as a Shapefile, a Phylogeny, a Cluster object etc), but without depending on Plots. This makes it possible to plot types with recipes defined with Plots, but without interfering with any other plotting packages.
So, though GR is usable on it's own, many users find that the higher-level syntax for Plots is nicer in everyday use, and enjoy the extra usefulness of recipes.

How to calculate in R with variables

I'm a R newbie.
is there a way i can calculate
(x+x^2+x^3)^2
in R?
so i will get the result:
x^6+2 x^5+3 x^4+2 x^3+x^2
I get an Error: object 'x' not found.
Thanks!
R isn't well suited for this. Some interface packages to languages and libraries that are better at this do exist, such as rSymPy, which allows you to access the SymPy Python library for symbolic mathematics (you'll need to install both). In a similar vein, Ryacas links to the yacas algebra system.
Those interfaces are useful if you need symbolic manipulation as part of an R workflow. Otherwise, consider using the original tools. The ones above are open source and freely available, while other free use alternatives also exist, such as the proprietary web based Wolfram Alpha (for limited use).

"Standard" R benchmarking code?

I am recompiling/upgrading my R install and I want to measure performance pre/post upgrade. Is there possibly a standard script to run to time some commonly used functions and libraries? I have already installed rbenchmark, but I am just not enough of an R user to know what type of code to write to properly benchmark the new installation.
I've seen people use R-benchmark-25 as on overall test of R.
When I compile BLAS's, I use something like what I post here to benchmark matrix operations from various packages.

Bivariate Poisson Regression in R?

I found a package 'bivpois' for R which evaluates a model for two related poisson processes (for example, the number of goals by the home and the away team in a soccer game). However, this package seems to no longer be useable in newer versions of R.
Is there a reasonable way to modify the glm() function to do a similar process, or run this older package on my new version of R? I have found very little literature on these sorts of processes and have found very little in terms of easy implementation in other statistical packages like STATA.
Any suggestions would be much appreciated.
While CRAN does not host a current binary of bivpois, you can build the package from the archived source code (see http://cran.r-project.org/doc/manuals/R-exts.html#Checking-and-building-packages ). Building bivpois 0.50-3.1 from source (available at http://cran.r-project.org/src/contrib/Archive/bivpois/) works for me on R 2.15.0 Windows x64. The zipped Windows binary I built is available here: http://commondatastorage.googleapis.com/jthetzel-public/bivpois_0.50-3.1.zip .
You can feel free to refer to odds modelling and testing inefficiency of sports-bookmakersas I had modified the relevant functions inside bivpois package.

Parallel programming for all R packages

Do you know if there are any plans to introduce parallel programming in R for all packages?
I'm aware of some developments such as R-revolution and parallel programming packages, but they seem to have specialised functions which replace the most popular functions (linear programming etc..). However one of the great things about R is the huge amount of specialised packages which prop up every day and make complex and time-consuming analysis very easy to run. Many of these use very popular functions such as the generalised linear model, but also use the results for additional calculation and comparison and finally sort out the output. As far as I understand you need to define which parts of a function can be run in parallel programming so this is probably why most specialised R packages don't have this functionality and cannot have it unless the code is edited.
Are there are any plans (or any packages) to enable all the most popular R functions to run in parallel processing so that all the less popular functions containing these can be run in parallel processing? For example, the package difR uses the glm function for most of its functions; if the glm package was enabled to run in parallel processing (or re-written and then released in a new R version) for all multi-processor machines then there would be no need to re-write the difR package and this could then run some of its most burdensome procedures with the aid of parallel programming on a Windows PC.
I completely agree with Paul's answer.
In addition, a general system for parallelization needs some very non-trivial calibration, even for those functions that can be easily parallelized: What if you have a call stack of several functions that offer parallel computation (e.g. you are bootstrapping some model fitting, the model fitting may already offer parallelization and low level linear algebra can be implicitly parallel)? You need to estimate (or choose manually) at which level explicit parallelization should be done. In addition, you possibly have implicit parallelization, so you need to trade off between these.
However, there is one particularly easy and general way to parallelize computations implicitly in R: linear algebra can be parallelized and sped up considerably by using an optimized BLAS. Using this can (depending on your system) be as easy as telling your package manager to install the optimized BLAS and R will use it. Once it is linked to R, all packages that use the base linear algebra functions like %*%, crossprod, solve etc. will profit.
See e.g. Dirk Eddelbüttel's gcbd package and its vignette, and also discussions how to use GotoBLAS2 / OpenBLAS.
How to parallelize a certain problem is often non-trivial. Therefore, a specific implementation has to be made in each and every case, in this case for each R package. So, I do not think a general implementation of parallel processing in R will be made, or is even possible.

Resources