lineprof equivalent for Rcpp - r

The lineprof package in R is very useful for profiling which parts of function take up time and allocate/free memory.
Is there a lineprof() equivalent for Rcpp ?
I currently use std::chrono::steady_clock and such to get chunk timings out of an Rcpp function. Alternatives? Does Rstudio IDE provide some help here?

To supplement #Dirk's answer...
If you are working on OS X, the Time Profiler Instrument, part of Apple's Instruments set of instrumentation tools, is an excellent sampling profiler.
Just to fix ideas:
A sampling profiler lets you answer the question, what code paths does my program spend the most time executing?
A (full) cache profiler lets you answer the question, which are the most frequently executed code paths in my program?
These are different questions -- it's possible that your hottest code paths are already optimized enough that, even though the total number of instructions executed in that path is very high, the amount of time required to execute them might be relatively low.
If you want to use instruments to profile C++ code / routines used in an R package, the easiest way to go about this is:
Create a target, pointed at your R executable, with appropriate command line arguments to run whatever functions you wish to profile:
Set the command line arguments to run the code that will exercise your C++ routines -- for example, this code runs Rcpp:::test(), to instrument all of the Rcpp test code:
Click the big red Record button, and off you go!
I'll leave the rest of the instructions in understanding instruments + the timing profiler to your google-fu + the documentation, but (if you're on OS X) you should be aware of this tool.

See any decent introduction to high(er) performance computing as eg some slides from (older) presentation of my talks page which include worked examples for both KCacheGrind (part of the KDE frontend to Valgrind) as well as Google Perftools.
In a more abstract sense, you need to come to terms with the fact that C++ != R and not all tools have identical counterparts. In particular Rprof, the R profiler which several CRAN packages for profiling build on top of, is based on the fact that R is interpreted. C++ is not, so things will be different. But profiling compiled is about as old as compiling and debugging so you will find numerous tutorials.

Related

Speeding up package load in Julia

I wrote a program to solve a linear program in Julia using GLPKMathProgInterface and JuMP. The Julia code is being called by python program which runs multiple instances of the Juila code through multiple command line calls. While I'm extremely happy with the performance of the actual solver the initialization is extremely slow. I was wondering if there were approaches to speed this up.
For example if I just save the following to a file
#time using DataFrames, CSV, GLPKMathProgInterface, JuMP, ArgParse
and run it
mylabtop:~ me$ julia test.jl
12.270137 seconds (6.54 M allocations: 364.537 MiB, 3.05% gc time)
This seems extremely slow, is there some good way to speed up using modules like a precompile step I could do once?
Since you haven't gotten any answers yet, let me give you the general first order answers - although I hope someone more qualified will answer your question in more detail (and correct me if I'm wrong).
1) Loading packages in Julia is sometimes rather slow up to the time of this writing. It has been discussed many times and you can expect improvements in the future. AFAIK this will happen in early 1.x releases after 1.0 is out. Have a look at this thread.
2) Since you typically only have to pay the loading time cost once per Julia session one approach is to keep the session running for as long as possible. You can execute your script with include("test.jl") from within the session. Let me also mention the amazing Revise.jl - it's hardly possible to overemphasize this package!
3) (I have no experience with this more difficult approach.) There is PackageCompiler.jl which allows you to compile a package into your system image. Read this blog post by Simon.
4) (Not recommended) There has also been the highly experimental static-julia which statically compiles your script into an shared library and executable.
Hope that helps.

CPLEX -linear-optimization-program for Unix?

The linear-optimization course 2.3140 requires CPLEX but it is pain to use because poorly-documented and hard-to-get any information when a brick wall like here and here, let alone not having the software locally.
Does there exist some linear-optimization -tool by which I could program like with CPLEX? Since I haven't used this tool for a year, I have forgotten a lot of trivial things. Now trying to find some tool that I could run even in my Debian comp or Apple -comp, any tool or lib existing?
Trial 1: Trying to find GUI -tool to execute code like this
Trying to understand how the CPLEX works from IBM Academic Initiative. In uni, I have some sort of Eclipse CPLEX -thing but I found only this -- where can I get the GUI thing for some Unix? Image here.
There is a ton of documentation available from ibm. If you want the software on your local machine and are a student, you can get it through the academic initiative. If you want to try something different and are a student, you can get gurobi, which has a python interface you might like.
I would recommend you to look at the COIN-OR website here:
http://www.coin-or.org/
They provide well-documented libraries and solvers (I use CPLEX mostly, so I don't use those much, but it is well documented and looks really good).
CPLEX alone does a lot of things, but for a linear programming course you will probably only need a tool to solve linear programs, and maybe mixed-integer problems (MIP).
Have a look at CMPL from coin, this may be enough for you; if you need to write "real" programs, you will have to use a (C or C++) library. They provide CoinMP for MIPs, and Clp for linear programs (simplex, barrier algorithms).
I have also used GLPK (from the GNU project) for linear programs, but it performs poorly for MIP (the default branch-and-bound procedure is very simple), although it may be enough for your course:
http://www.gnu.org/software/glpk/
However, I don't really agree with you about the fact that CPLEX documentation is poor..
Python
I haven't tested CVXOPT but my teacher mocked it, apparently a bit buggy, more here.

Converting models in Matlab/R to C++/Java

I would like to convert an ARIMA model developed in R using the forecast library to Java code. Note that I need to implement only the forecasting part. The fitting can be done in R itself. I am going to look at the predict function and translate it to Java code. I was just wondering if anyone else had been in a similar situation before and managed to successfully use a Java library for the same.
Along similar lines, and perhaps this is a more general question without a concrete answer; What is the best way to deal with situations where in model building can be done in Matlab/R but the prediction/forecasting needs to be done in Java/C++? Increasingly, I have been encountering such a situation over and over again. I guess you have to bite the bullet and write the code yourself and this is not generally as hard as writing the fitting/estimation yourself. Any advice on the topic would be helpful.
You write about 'R or Matlab' to 'C++ or Java'. This gives 2 x 2 choices which is too many degrees of freedom for my taste. So allow me to concentrate on C++ as the target.
Let's consider a simpler case: Prototyping in R, and deploying in C++. If and when the R package you use is actually implemented in C or C++, this becomes pretty easy. You "merely" need to disentangle the routine you are after from its other dependencies (header files, defines, data structures, ...) and provide it with the data and parameters needed. I have done that in the past for production systems.
Here, you talk about the forecast package. This happens to depend on the RcppArmadillo package which itself brings the nice Armadillo C++ library to R. So chances are you can in fact re-write this as a self-contained unit.
Armadillo is also interesting when you want to port Matlab to C++ as it is written to help with exactly that task in mind. I have ported some relatively extensive Matlab code to C++ and reaped a substantial speed gain.
I'm not sure whether this is possible in R, but in Matlab you can interact with your Matlab code from Java - see http://www.cs.virginia.edu/~whitehouse/matlab/JavaMatlab.html. This would enable you to leave all the forecasting code in Matlab and have e.g. an interface written in Java.
Alternatively, you might want to have predictive code written in Java so that you can produce a model and then distribute a program that uses the model without having a dependency on Matlab. The Matlab compiler maybe be useful here, but I've never used it.
A final simple way of interacting messily between Matlab and Java would be (on linux) using pseudoterminals where you would have a pty/tty pair to interface Java and Matlab. In this case you would send data from Java to Matlab, and have Matlab return the forecasting results. I expect this would also work in R, but I don't know the syntax.
In general though, reimplementing the code is a decent solution and probably quicker than learning how to interface java+matlab or create Matlab libraries.
Some further information on the answer given by Richante: Matlab has some really nice capabilities for interop with compiled languages such as C/C++, C#, and Java. In your particular case you might find the toolbox Matlab Builder JA to be particularly relevant. It allows you to export your Matlab code directly to Java, meaning you can directly call code that you've constructed during your model-building phase in Matlab from Java.
More information from the Mathworks here.
I am also concerned with converting "R to Java" so will speak to that part.
As Vincent Zooneykind said in his comment - the PMML library in R makes sense for model export in general but "forecast" is not a supported library as of yet.
An alternative is to use something like https://www.opencpu.org/ to make a call to R from your java program. It surfaces the R code on a http server. Can then just call it with parameters as with a normal http call and return what is neede using java.net.HttpUrlConnection or a choice of http libraries available in Java.
Pros: Separation of concerns, no need to re-write the R code
Cons: Invoking an R server in your live process so need to make sure that is handled robustly

R package that automatically uses several cores?

I have noticed that R only uses one core while executing one of my programs which requires lots of calculations. I would like to take advantage of my multi-core processor to make my program run faster.
I have not yet investigated the question in depth but I would appreciate to benefit from your comments because I do not have good knowledge in computer science and it is difficult for me to get easily understandable information on that subject.
Is there a package that allows R to automatically use several cores when needed?
I guess it is not that simple.
R can only make use of multiple cores with the help of add-on packages, and only for some types of operation. The options are discussed in detail on the High Performance Computing Task View on CRAN
Update: From R Version 2.14.0 add-on packages are not necessarily required due to the inclusion of the parallel package as a recommended package shipped with R. parallel includes functionality from the multicore and snow packages, largely unchanged.
The easiest way to take advantage of multiprocessors is the multicore package which includes the function mclapply(). mclapply() is a multicore version of lapply(). So any process that can use lapply() can be easily converted to an mclapply() process. However, multicore does not work on Windows. I wrote a blog post about this last year which might be helpful. The package Revolution Analytics created, doSMP, is NOT a multi-threaded version of R. It's effectively a Windows version of multicore.
If your work is embarrassingly parallel, it's a good idea to get comfortable with the lapply() type of structuring. That will give you easy segue into mclapply() and even distributed computing using the same abstraction.
Things get much more difficult for operations that are not "embarrassingly parallel".
[EDIT]
As a side note, Rstudio is getting increasingly popular as a front end for R. I love Rstudio and use it daily. However it needs to be noted that Rstudio does not play nice with Multicore (at least as of Oct 2011... I understand that the RStudio team is going to fix this). This is because Rstudio does some forking behind the scenes and these forks conflict with Multicore's attempts to fork. So if you need Multicore, you can write your code in Rstuido, but run it in a plain-Jane R session.
On this question you always get very short answers. The easiest solution according to me is the package snowfall, based on snow. That is, on a Windows single computer with multiple cores. See also here the article of Knaus et al for a simple example. Snowfall is a wrapper around the snow package, and allows you to setup a multicore with a few commands. It's definitely less hassle than most of the other packages (I didn't try all of them).
On a sidenote, there are indeed only few tasks that can be parallelized, for the very simple reason that you have to be able to split up the tasks before multicore calculation makes sense. the apply family is obviously a logical choice for this : multiple and independent computations, which is crucial for multicore use. Anything else is not always that easily multicored.
Read also this discussion on sfApply and custom functions.
Microsoft R Open includes multi-threaded math libraries to improve the performance of R.It works in Windows/Unix/Mac all OS type. It's open source and can be installed in a separate directory if you have any existing R(from CRAN) installation. You can use popular IDE Rstudio also with this.From its inception, R was designed to use only a single thread (processor) at a time. Even today, R works that way unless linked with multi-threaded BLAS/LAPACK libraries.
The multi-core machines of today offer parallel processing power. To take advantage of this, Microsoft R Open includes multi-threaded math libraries.
These libraries make it possible for so many common R operations, such as matrix multiply/inverse, matrix decomposition, and some higher-level matrix operations, to compute in parallel and use all of the processing power available to reduce computation times.
Please check the below link:
https://mran.revolutionanalytics.com/rro/#about-rro
http://www.r-bloggers.com/using-microsoft-r-open-with-rstudio/
As David Heffernan said, take a look at the Blog of revolution Analytics. But you should know that most packages are for Linux. So, if you use windows it will be much harder.
Anyway, take a look at these sites:
Revolution. Here you will find a lecture about parallerization in R. The lecture is actually very good, but, as I said, most tips are for Linux.
And this thread here at Stackoverflow will disscuss some implementation in Windows.
The package future makes it extremely simple to work in R using parallel and distributed processing. More info here. If you want to apply a function to elements in parallel, the future.apply package provides a quick way to use the "apply" family functions (e.g. apply(), lapply(), and vapply()) in parallel.
Example:
library("future.apply")
library("stats")
x <- 1:10
# Single core
y <- lapply(x, FUN = quantile, probs = 1:3/4)
# Multicore in parallel
plan(multiprocess)
y <- future_lapply(x, FUN = quantile, probs = 1:3/4)

What useful R package doesn't currently exist? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I have been working on a few R packages for some general tools that aren't currently available in R: blogging, report delivery, logging, and scheduling. This led me to wonder: what are the most important things that people wish existed in R that currently aren't available?
My hope is that we can use this to pinpoint some gaps, and possibly work on them collaboratively.
I'm a former Mathematica junkie, and one thing that I really miss is the notebook style interface. When I did my research with notebooks, papers would almost write themselves as I did my analysis. But now that I'm using R, I find that documenting my work to be quite tedious.
For people that are not so familiar with Mathematica, you have documents called "notebooks" that can contain code, text, equations, and the results from executed code (which can be equations, text, graphics, or interactive tools). Everything can be neatly organized into styled subsections or sections that are collapsable. You can have multiple open documents that integrate with a single shared kernel.
While I don't think a full-blown Mathematica style interface is entirely necessary, some interactive document system that would support text (for description), code, code output, and embedded image output would be a real boon to researchers.
A Real-Time R package would be my choice, using C Streaming perhaps.
Also I'd like a more robust web development package. Nothing as extensive as Ruby on Rails but something a bit better than Sweave combined with R2HTML, that can run on RApache. I think this needs to be a huge area of emphasis for R in general.
I realize LaTeX is better markup for certain academia but in general I think HTML should be the markup language of choice. More needs to be done in terms of R Web Apps, so applications can be hosted on huge RAM remotely and R can start being used for SaaS data applications and other graphics choices.
Interfaces to any of the new-fangled 'Web 2.0' databases that use key-value pairs rather than the standard RDMS. A non-exhaustive list (in alphabetical order) would be
Cassandra Project
CouchDB
MongoDB
Project Voldemort
Redis
Tokyo Cabinet
and it would of course be nice if we had a DBI-alike abstraction on top of this. Jeff has started with RBerkeley but that use the older-school Oracle BerkeleyDB backend rather than one of those new things.
An output device which produces Javascript code, perhaps using the protovis library.
as a programmer and writer of libraries for colleagues, I was definitely missing a logging package, I googled and asked around, here too, then wrote one myself. it is on r-forge, here, and it s called "logging" :)
I use it and I'm obviously still developing it.
There are few libraries to interface with database in general, and there is not ORM library.
RMySQL is useful, but you have to write the SQL queries manually and there is not a way to generate them as in a ORM. Morevoer, it is only specific to MySQL.
Another library set that R still doesn't have, for me, it is a good system for reading command line arguments: there is R getopt but it is nothing like, for example, argparse in python.
A natural interface to the .NET framework would be awesome, though I suspect that that might be a lot of work.
EDIT:
Syntax highlighting from within RGui would also be wonderful.
ANOTHER EDIT:
R.NET now exists to integrate R with .NET.
A FRAQ package for FRequently Asked Questions, a la fortune(). R-help would be so much fun: "Try this, library(FRAQ); faq("lattice won't print"), etc.
See also.
A wiki package that adds wiki-like documentation to R packages. You'd have a inst/wiki subdirectory with plain text files in markdown, asciidoc, textile, with embedded R code. With the right incantation, these files would be executed (think brew and/or asciidoc packages), and the relevant output uploaded to a given repository online (github, googlecode, etc.). Another function could take care of synchronizing the changes made online, typically via svn or git.
Suddenly you have a wiki documentation for your package with reproducible examples (could even be hooked to R CMD check).
EDIT 2012:
... and now the knitr package would make this process even easier and neater
I would like to see a possibility to embed another programming language within R in a more straightforward way by the users. I give this as an example in some common-lisp implementations one could write a function with embedded C code like this:
(defun sample (x)
(ffi:c-inline (n1 n2) (:int :int) (values :int :int) "{
int n1 = #0, n2 = #1, out1 = 0, out2 = 1;
while (n1 <= n2) {
out1 += n1;
out2 *= n1;
n1++;
}
#(return 0)= out1;
#(return 1)= out2;
}"
:side-effects nil))
It would be good if one could write an R function with embedded C or lisp code (more interested in the latter) in a similar way.
A native .NET interface to RGUI. R(D)Com is based on COM, and it only allows to exchange matrices, not more complex structures.
I would very much like a line profiler. This exists in Matlab and Python, and is very useful for finding bits of code that take a lot of time or are executed more (or less) than expected. A lot of my code involves function optimizations and how many times something iterates may not be known in advance (though most iterations are constrained or specified).
The call stack is useful if all of your code is in R and is very simple, but as I recently posted about it, it takes a painstaking effort if your code is complex.
It's quite easy to develop a line profiler for a given bit of code. A naive way is to index every line (or just pre-specified sections) and insert a call to log proc.time() that line. In a loop, I simply enumerate sections of code and store in a 2 dimensional list the proc.time values for section i in iteration k. [See update below: this isn't actually a way to do a line profiler for all kinds of code.]
One can use such a tool to find hotspots, anomalies (e.g. code that should be O(n) but is really O(n^2)), code that may benefit from memoization (a line profiler doesn't tell you this, but it lets you know where to look), code that is mistakenly inside a loop, and more.
Update 1: Inserting a timing line between every function line is slightly erroneous: the definition of a line of code is not simply code separated by whitespace. Being able to parse the code into an AST is necessary for knowing where operations begin and end. As discussed in some of the answers to this question, there are some tools (namely, showTree and walkCode in the codetools package) for doing this. Simply applying a regular expression to source code would be a very bad thing to do.

Resources