R 2.14 byte compile - why not? - r

Why wouldn't I byte-compile all the packages I install? Is there some consequence of byte-compile making it a decision to think about?

One negative is that you can't debug byte compiled code. On the flip side, once the
code is production ready, in theory you wouldn't need that (and you could reinstall it w/o byte compilation if you needed to)

In R version 2.14, a major downside of byte-compiling was that it could slow down certain functions. Another two downsides were increased package size and installation.
For the current version of R (3.3.X), I have yet to find a downside for byte-compiling.

Currently the development version of R already byte-compiles all packages by default, so one does not have to turn byte-compilation on in the DESCRIPTION file. A related answer mentions overheads of byte-compilation - it is possible but rare that byte-compilation would harm performance (it can happen when code is loaded that will never be used - the JIT won't compile it, but the loader still loads it; hopefully this can be addressed in the future).
browser() and debugging with the byte-compiled code works, from the user perspective, the same way as with non-compiled code. Internally the debugger runs on the AST of the program (so bypassing the byte-code), but this is not visible to the user.

Related

How to find a memory leak in Meteor App

I made a little app and deployed it into a Ubuntu server using Meteor Up.
There are very few users each day (<10), but after few days a lot of the memory of the server is used.
So I think that there is a memory leak somewhere in my code.
How to find it ?
Thanks a lot !
We actually had a memory leak the last days and it was relatively easy to find using a package called heapdump, you can find it here: https://www.npmjs.com/package/heapdump
It is not made for meteor in specific, but for nodejs. Just read through the README carefully to install it. Afterwards find a good moment to get the first heapdump by running kill -USR2 <pid_of_meteor_app> on the server. A good moment is when there is not much going on on the server but enough so that the memory is leaking.
Afer a while, when you recognized a good amount of memory growth without a logical explanation: go make another heapdump and download both.
Hit F12 to open up you webdev console on your browser (Chrome, Firefox, Edge,...) and got to Memory there. Import both heapdumps after.
Now you need to find what changed between those two heapdumps, what actually helped me there was this article to understand how to do that: https://www.useanvil.com/blog/engineering/isolating-memory-leak-in-node/
Remember that you are most probably looking for memory reservations of the same size, sometimes just tiny amounts of kb as in our case, but hundreds of thousands of them. So sorting by space is a good idea.
In our case it was an outdated package called tslib which reserverd all the memory after a day or so. We were on 2.3.1, so we went to https://github.com/microsoft/tslib/releases/tag/2.4.0 and read there:
This release includes the __classPrivateFieldIn helper as well as an
update to __createBinding to reduce indirection between multiple
re-exports.
We updated the package, which was a dependency of another package and that fixed it.
Kadira, Monti APM and whatever is often absolutely useless in such cases, you cannot really track down the source more often than not.
There is a kadira package to check your app. have a look https://kadira.io/

How switch R architectures dynamically in RStudio

In RStudio there's a Tools menu which allows you to select an installed version/architecture of R under Global Options.
That's great, but my issue with that is that, as the name implies, it is a Global option, so once you select a different architecture (or version number) you then have to restart RStudio and it applies to all of your RStudio instances and projects.
This is a problem for me because:
I have some scripts within a given project that strictly require 32-bit R due to the fact that they're interfacing with 32-bit databases, such as Hortonworks' Hadoop
I have other scripts within the same project which strictly require 64-bit R, due to (a) availability of certain packages and (b) memory limits being prohibitively small in 32-bit R on my OS
which we can call "Issue #1" and it's also a problem because I have certain projects which require a specific architecture, though all the scripts within the project use the same architecture (which should theoretically be an easier to solve problem that we can call "Issue #2").
If we can solve Issue #1 then Issue #2 is solved as well. If we can solve Issue #2 I'll still be better off, even if Issue #1 is unsolved.
I'm basically asking if anyone has a hack, work-around, or better workflow to address this need for frequently switching architectures and/or needing to run different architectures in different R/RStudio sessions simultaneously for different projects on a regular basis.
I know that this functionality would probably represent a feature request for RStudio and if this question is not appropriate for StackOverflow for that reason then let me know and I'll delete it. I just figured that a lot of other people probably have this issue, so maybe someone has found a work-around/hack?
There's no simple way to do this, but there are some workarounds. One you might consider is launching the correct bit-flavor of R from the current bit-flavor of R via system2 invoking Rscript.exe, e.g. (untested code):
source32 <- function(file) {
system2("C:\\Program Files\\R\\R-3.1.0\\bin\\i386\\Rscript.exe", normalizePath(file))
}
...
# Run a 64 bit script
source("my64.R")
# Run a 32 bit script
source32("my32.R")
Of course that doesn't really give you a 32 bit interactive session so much as the ability to run code as 32 bit.
One other tip: If you hold down CTRL while launching RStudio, you can pick the R flavor and bitness to launch on startup. This will save you some time if you're switching a lot.

Julia compiles the script every time?

Julia language compiles the script every time, can't we compile binaries with julia instead?
I tried a small helloworld script with println function it took like 2,3 seconds for julia to show the output! It would be better if we can make binaries instead of compiling every time
Update: There have been some changes in Julia, since I asked this question. Though I'm not following the updates for julia anymore, since I've asked this question and if you're looking for something similar, look into the below answers and comments by people who are following julia.
Also, its good to know that now it takes around 150ms to load a script.
Keno's answer is spot on, but maybe I can give a little more detail on what's going on and what we're planning to do about it.
Currently there is only an LLVM JIT mode:
There's a very trivial interpreter for some simple top-level statements.
All other code is jitted into machine code before execution. The code is aggressively specialized using the run-time types of the values that the code is being applied to, propagated through the program using dynamic type inference.
This is how Julia gets good performance even when code is written without type annotations: if you call f(1) you get code specialized for Int64 — the type of 1 on 64-bit systems; if you call f(1.0) you get a newly jitted version that is specialized for Float64 — the type of 1.0 on all systems. Since each compiled version of the function knows what types it will be getting, it can run at C-like speed. You can sabotage this by writing and using "type-unstable" functions whose return type depends on run-time data, rather than just types, but we've taken great care not to do that in designing the core language and standard library.
Most of Julia is written in itself, then parsed, type-inferred and jitted, so bootstrapping the entire system from scratch takes some 15-20 seconds. To make it faster, we have a staged system where we parse, type-infer, and then cache a serialized version of the type-inferred AST in the file sys.ji. This file is then loaded and used to run the system when you run julia. No LLVM code or machine code is cached in sys.ji, however, so all the LLVM jitting still needs to be done every time julia starts up, which therefore takes about 2 seconds.
This 2-second startup delay is quite annoying and we have a plan for fixing it. The basic plan is to be able to compile whole Julia programs to binaries: either executables that can be run or .so/.dylib shared libraries that can be called from other programs as though they were simply shared C libraries. The startup time for a binary will be like any other C program, so the 2-second startup delay will vanish.
Addendum 1: Since November 2013, the development version of Julia no longer has a 2-second startup delay since it precompiles the standard library as binary code. The startup time is still 10x slower than Python and Ruby, so there's room for improvement, but it's pretty fast. The next step will be to allow precompilation of packages and scripts so that those can startup just as fast as Julia itself already does.
Addendum 2: Since June 2015, the development version of Julia precompiles many packages automatically, allowing them to load quickly. The next step is static compilation of entire Julia programs.
At the moment Julia JIT compiles its entire standard library on startup. We are aware of the situation and are currently working on caching the LLVM JIT output to remedy the situation, but until then, there's no way around it (except for using the REPL).

determining of packages are loaded in compiled form, R 2.13.0

We have just installed R version 2.13.0 as an RSERVE and it seems to be working well. We want to try and take advantage of pre-compiling loaded packages using the bytecode compiler. We think we got it configured correctly, but wanted to verify.
Is there a command we can issue that shows which packages are loaded in bytecode compiled form?
By default none are, so you would have to tell us more about what you did in order for us to have a chance to assess whether you were in fact successful.

R package and execution time

I have developed a big library of functions in R.
For the moment I just load ("source") the functions at the beginning of all my scripts.
I have seen that I can create packages.
My question is: Will that improve the execution time of my functions? (by transforming interpreter code into machine language?)
What does the package creation does? Does it creates binaries?
Thanks
fred
There isn't an R compiler yet Packaging your R code won't improve its execution time massively. It also won't create binaries for you - you need to build those from the package tarball (or get CRAN or similar to build them for you). There is now a byte compiler for R and R's packages are now by default byte compiled. Speed improvements are in general modest - don't expect C-like speed.
Packaging R code just does exactly that; it packages the R code, code to be compiled (C Fortran etc), man pages, documentation, tests etc into a standard format that can be distributed to users and installed/built on multiple architectures.
Packages can take advantage of things like lazy loading such that R objects (your functions say) are only loaded when needed, whereas source loads them all into the global environment (by default).
If you don't intend to distribute your code then there are few benefits of packaging just for your own use, but if you do package and write documentation and examples/tests, you might be alerted to changes in the package code that break examples or cause tests to fail. That way you are better informed as to the reliability of your code, even if it is only you using it!

Resources