I am trying to learn how to work with nls.lm in the R library minpack.lm by using the Rosenbrock function to see if the algorithm converges to the global minimum at f(x,y) = (1,1). I do so both with and without the analytic Jacobian. In both instances, I get a warning telling me that the algorithm has decided to revert the maximum number of iterations specified in the call to nls.lm to 1024:
Warning messages:
1: In nls.lm(par = initpar, fn = objective_rosenbrock, jac = gradient_rosenbrock, :
resetting `maxiter' to 1024!
2: In nls.lm(par = initpar, fn = objective_rosenbrock, jac = gradient_rosenbrock, :
lmder: info = -1. Number of iterations has reached `maxiter' == 1024.
The algorithm never quite reaches (1,1) as a result given my initial guess of (-1.2, 1.0). I found the source code for the library on GitHub and the following lines of code are pertinent here:
https://github.com/cran/minpack.lm/blob/master/src/nls_lm.c
OS->maxiter = INTEGER_VALUE(getListElement(control, "maxiter"));
if(OS->maxiter > 1024) {
OS->maxiter = 1024;
warning("resetting `maxiter' to 1024!");
}
Is there any logic to why the maximum number of iterations is capped to 1024? Something with bits and 2^10? I would like to use the library for a different application, but this cap on iterations might prevent that. Any insight would be appreciated.
Git blame says that this code limiting the max iterations was introduced in version 1.1-0, in 2008. The NEWS file for the package only goes back as far as version 1.1-6. I can't find the code in any public repo other than the one you point to (which is only a CRAN mirror; it doesn't contain any comments/commit messages/etc. from developers that might give us clues.)
Other than contacting the maintainer I think it's going to be hard to figure out what the rationale is for this limit.
I do have some guesses though.
The only places that maxiter is actually used in the code are here and here - in R code, not Fortran or C code, so it seems extremely unlikely that we are dealing with something like a 10-bit unsigned integer type (which seems an unlikely choice in any case). I think the limitation is there because we also have a buffer defined for holding trace information here:
double rsstrace[1024];
which, as you can see, is hard-coded to a length of 1024. Presumably bad things would happen if we tried to stuff 1025 iterations'-worth of tracing information into this array ...
My suggestions:
change all instances of '1024' in the code to something larger and see what happens. There are only four:
$ find . -type f -exec grep -Hn 1024 {} \;
./src/nls_lm.c:141: if(OS->maxiter > 1024) {
./src/nls_lm.c:142: OS->maxiter = 1024;
./src/nls_lm.c:143: warning("resetting `maxiter' to 1024!");
./src/minpack_lm.h:20: double rsstrace[1024];
it would be best to #define MAXITER 2048 (or whatever) in src/minpack_lm.h and use that instead of the numerical value.
Contact the maintainer (maintainer("minpack.lm")) and ask them about this issue.
The aim is to implement a fast version of the orthogonal projective non-negative matrix factorization (opnmf) in R. I am translating the matlab code available here.
I implemented a vanilla R version but it is much slower (about 5.5x slower) than the matlab implementation on my data (~ 225000 x 150) for 20 factor solution.
So I thought using c++ might speed up things but its speed is similar to R. I think this can be optimized but not sure how as I am a newbie to c++. Here is a thread that discusses a similar problem.
Here is my RcppArmadillo implementation.
// [[Rcpp::export]]
Rcpp::List arma_opnmf(const arma::mat & X, const arma::mat & W0, double tol=0.00001, int maxiter=10000, double eps=1e-16) {
arma::mat W = W0;
arma::mat Wold = W;
arma::mat XXW = X * (X.t()*W);
double diffW = 9999999999.9;
Rcout << "The value of maxiter : " << maxiter << "\n";
Rcout << "The value of tol : " << tol << "\n";
int i;
for (i = 0; i < maxiter; i++) {
XXW = X * (X.t()*W);
W = W % XXW / (W * (W.t() * XXW));
//W = W % (X*(X.t()*W)) / (W*((W.t()*X)*(X.t()*W)));
arma::uvec idx = find(W < eps);
W.elem(idx).fill(eps);
W = W / norm(W,2);
diffW = norm(Wold-W, "fro") / norm(Wold, "fro");
if(diffW < tol) {
break;
} else {
Wold = W;
}
if(i % 10 == 0) {
Rcpp::checkUserInterrupt();
}
}
return Rcpp::List::create(Rcpp::Named("W")=W,
Rcpp::Named("iter")=i,
Rcpp::Named("diffW")=diffW);
}
This suggested issue confirms that matlab is quite fast, so is there no hope when using R / c++?
The tests were made on Windows 10 and Ubuntu 16 with R version 4.0.0.
EDIT
After the interesting comments in the answer below. I am posting additional details. I ran tests on a Windows 10 machine with R 3.5.3 (as that's what Microsoft provides) and the comparison shows that RcppArmadillo with Microsoft's R is fastest.
R
user system elapsed
213.76 7.36 221.42
R with RcppArmadillo
user system elapsed
179.88 3.44 183.43
Microsoft's Open R
user system elapsed
167.33 9.96 45.94
Microsoft's Open with RcppArmadillo
user system elapsed
85.47 4.66 23.56
Are you aware that this code is "ultimately" executed by a pair of libraries called LAPACK and BLAS?
Are you aware that Matlab ships with a highly optimised one? Are you aware that on all systems that R runs on you can change which LAPACK/BLAS is being used.
The difference matters greatly. Just this morning a friend posted this tweet contrasting the same R code running on the same Windows computer but in two different R environments. The six-times faster one "simply" uses a parallel LAPACK/BLAS implementation.
Here, you haven't even told us which operating system you are on. You can get OpenBLAS (which uses parallelism) for all OSs that R runs on. You can even get the Intel MKL (which IIRC is what Matlab uses too) fairly easily on some OSs. For Ubuntu/Debian I published a script on GitHub that does it in one step.
Lastly, many years ago I "inherited" a fast program running in Matlab on a (then-large-ish) Windows computer. I rewrote the Matlab part (carefully and slowly, it's effort) in C++ using RcppArmadillo leading a few factors of improvement -- and because we could run that (now open source) code in parallel from R on the same computer another few factors. Together it was orders of magnitude turning a day-long simulation into something that ran a few minutes. So "yes, you can".
Edit: As you have access to Ubuntu, you can switch from basic LAPACK/BLAS to OpenBLAS via a single command, though I am no longer that familiar with Ubuntu 16.04 (as I run 20.04 myself).
Edit 2: Picking up the comparison from Josef's tweet, the Docker r-base container I also maintainer (as part of the Rocker Project) can use OpenBLAS. [1] So once we add it, e.g. via apt-get install libopenblas-dev the timing of a simple repeated matrix crossproduct moves from
root#0eb44b1fcc06:/# Rscript -e 'v <- matrix(1:1e6,1e3); system.time(replicate(10, crossprod(v,v)))'
user system elapsed
9.289 0.084 9.373
root#0eb44b1fcc06:/#
to
root#67bd334f53d4:/# Rscript -e 'v <- matrix(1:1e6,1e3); system.time(replicate(10, crossprod(v,v)))'
user system elapsed
2.259 2.370 0.447
root#67bd334f53d4:/#
which is substantial.
I'm attempting to run some fairly deep recursive code in R and it keeps giving me this error:
Error: C stack usage is too close to the limit
My output from CStack_info() is:
Cstack_info()
size current direction eval_depth
67108864 8120 1 2
I have plenty of memory on my machine, I'm just trying to figure out how I can increase the CStack for R.
EDIT: Someone asked for a reproducible example. Here's some basic sample code that causes the problem. Running f(1,1) a few times you'll get the error. Note that I've already set --max-ppsize = 500000 and options(expressions=500000) so if you don't set those you might get an error about one of those two things instead. As you can see, the recursion can go pretty deep here and I've got no idea how to get it to work consistently. Thanks.
f <- function(root=1,lambda=1) {
x <- c(0,1);
prob <- c(1/(lambda+1),lambda/(lambda+1));
repeat {
if(root == 0) {
break;
}
else {
child <- sample(x,2,replace=TRUE,prob);
if(child[1] == 0 && child[2] == 0) {
break;
}
if(child[1] == 1) {
child[1] <- f(root=child[1],lambda);
}
if(child[2] == 1 && child[1] == 0) {
child[2] <- f(root=child[2],lambda);
}
}
if(child[1] == 0 && child[2] == 0) {
break;
}
if(child[1] == 1 || child[2] == 1) {
root <- sample(x,1,replace=TRUE,prob);
}
}
return(root)
}
The stack size is an operating system parameter, adjustable per-process (see setrlimit(2)). You can't adjust it from within R as far as I can tell, but you can adjust it from the shell before starting R, with the ulimit command. It works like this:
$ ulimit -s # print default
8192
$ R --slave -e 'Cstack_info()["size"]'
size
8388608
8388608 = 1024 * 8192; R is printing the same value as ulimit -s, but in bytes instead of kilobytes.
$ ulimit -s 16384 # enlarge stack limit to 16 megs
$ R --slave -e 'Cstack_info()["size"]'
size
16777216
To make a permanent adjustment to this setting, add the ulimit command to your shell startup file, so it's executed every time you log in. I can't give more specific directions than that, because it depends on exactly which shell you have and stuff. I also don't know how to do it for logging into a graphical environment (which will be relevant if you're not running R inside a terminal window).
I suspect that, regardless of stack limit, you'll end up with recursions that are too deep. For instance, with lambda = Inf, f(1) leads to an immediate recursion, indefinitely. The depth of the recursion seems to be a random walk, with some probability r of going deeper, 1 - r of finishing the current recursion. By the time you've hit the stack limit, you've made a large number of steps 'deeper'. This implies that r > 1 / 2, and the very large majority of time you'll just continue to recurse.
Also, it seems like it is almost possible to derive an analytic or at least numerical solution even in the face of infinite recursion. One can define p as the probability that f(1) == 1, write implicit expressions for the 'child' states after a single iteration, and equate these with p, and solve. p can then be used as the chance of success in a single draw from a binomial distribution.
This error is not due to memory it is due to recursion. A function is calling itself. This isn't always obvious from examining the definition of only one function. To illustrate the point, here is a minimal example of 2 functions that call each other:
change_to_factor <- function(x){
x <- change_to_character(x)
as.factor(x)
}
change_to_character <- function(x){
x <- change_to_factor(x)
as.character(x)
}
change_to_character("1")
Error: C stack usage 7971600 is too close to the limit
The functions will continue to call each other recursively and will theoretically never complete, even if you increase the limit it will still be exceeded. It is only checks within your system that prevent this from occurring indefinitely and consuming all of the compute resources of your machine. You need to alter the functions to ensure that they won't indefinitely call itself (or each other) recursively.
This happened to me for a completely different reason. I accidentally created a superlong string while combining two columns:
output_table_subset = mutate(big_data_frame,
combined_table = paste0(first_part, second_part, col = "_"))
instead of
output_table_subset = mutate(big_data_frame,
combined_table = paste0(first_part, second_part, sep = "_"))
Took me for ever to figure it out as I never expected the paste to have caused the problem.
I encountered the same problem of receiving the "C stack usage is too close to the limit" error (albeit for another application than the one stated by user2045093 above). I tried zwol's proposal but it didn't work out.
To my own surprise, I could solve the problem by installing the newest version of R for OS X (currently: version 3.2.3) as well as the newest version of R Studio for OS X (currently: 0.99.840), since I am working with R Studio.
Hopefully, this may be of some help to you as well.
One issue here can be that you're calling f inside itself
plop <- function(a = 2){
pouet <- sample(a)
plop(pouet)
}
plop()
Erreur : évaluations trop profondément imbriquées : récursion infinie / options(expressions=) ?
Erreur pendant l'emballage (wrapup) : évaluations trop profondément imbriquées : récursion infinie / options(expressions=) ?
Mine is perhaps a more unique case, but may help the few who have this exact problem:
My case has absolutely nothing to do with space usage, still R gave the:
C stack usage is too close to the limit
I had a defined function which is an upgrade of the base function:
saveRDS()
But,
Accidentally, this defined function was called saveRDS() instead of safe_saveRDS().
Thus, past that definition, when the code got to the line wihch actually uses saveRDS(...) (which calls the original base version, not the upgraded one), it gave the above error and crushed.
So, if you're getting that error when calling some saving function, see if you didn't accidentally run over it.
On Linux, I have permanently increased the size of the stack and memlock memories by doing so :
sudo vi /etc/security/limits.conf
Then, add the following lines at the end of the file.
* soft memlock unlimited
* hard memlock unlimited
* soft stack unlimited
* hard stack unlimited
For everyone's information, I am suddenly running into this with R 3.6.1 on Windows 7 (64-bit). It was not a problem before, and now stack limits seem to be popping up everywhere, when I try to "save(.)" data or even do a "save.image(.)". It's like the serialization is blowing these stacks away.
I am seriously considering dropping back to 3.6.0. Didn't happen there.
I often include a commented-out source("path/to/file/thefile.R") line at the top of an R script, e.g. thefile.R, so I can easily copy-paste this into the terminal to run it. I get this error if I forget to comment out the line, since running the file runs the file, which runs the file, which runs the file, ...
If that is the cause, the solution is simple: comment out the line.
Not sure if we re listing issues here but it happened to me with leaflet().
I was trying to map a dataframe in which a date column was of class POSIXlt.
Changing back to POSIXct solved the issue.
As Martin Morgan wrote... The problem is that you get too deep inside of recursion. If the recursion does not converge at all, you need to break it by your own. I hope this code is going to work, because It is not tested. However at least point should be clear here.
f <- function(root=1,lambda=1,depth=1) {
if(depth > 256){
return(NA)
}
x <- c(0,1);
prob <- c(1/(lambda+1),lambda/(lambda+1));
repeat {
if(root == 0) {
break;
} else {
child <- sample(x,2,replace=TRUE,prob);
if(child[1] == 0 && child[2] == 0) {
break;
}
if(child[1] == 1) {
child[1] <- f(root=child[1],lambda,depth+1);
}
if(child[2] == 1 && child[1] == 0) {
child[2] <- f(root=child[2],lambda,depth+1);
}
}
if(child[1] == NA | child[2] == NA){
return NA;
}
if(child[1] == 0 && child[2] == 0) {
break;
}
if(child[1] == 1 || child[2] == 1) {
root <- sample(x,1,replace=TRUE,prob);
}
}
return(root)
}
If you're using plot_ly check which columns you are passing. It seems that for POSIXdt/ct columns, you have to use as.character() before passing to plotly or you get this exception!
Here is how I encountered this error message. I met this error message when I tried to print a data.table in the console. It turned out it was because I mistakenly made a super super long string (by using collapse in paste() when I shouldn't) in a column.
The package caret has a function called createDataPartition that always results in error when the dataset to be partitioned has more than 1m rows.
Just for your info.
I faced the same issue. This problem won't be solved by reinstalling R or Rstudio or by increasing the stack size. Here is a solution that solved this problem -
If you are sourcing a.R inside b.R and at the same time sourcing b.R inside a.R, then the stack will fill up very fast.
Problem
This is the first file a.R in which b.R is sourced
#---- a.R File -----
source("/b.R")
...
...
#--------------------
This is the second file b.R, in which a.R is sourced
#---- b.R File -----
source("/a.R")
...
...
#--------------------
Solution
Source only one file to avoid the recursive calling of files within each other
#---- a.R File -----
source("/b.R")
...
...
#--------------------
#---- b.R File -----
...
...
#--------------------
OR
#---- a.R File -----
...
...
...
#--------------------
#---- b.R File -----
source("/a.R")
...
...
#--------------------
Another way to cause the same problem:
library(debug)
mtrace(lapply)
The recursive call isn't as obvious here.
I am trying to run this example on a fresh julia installation (Version 1.0.2 (2018-11-08)):
https://github.com/JuliaOpt/JuMP.jl/blob/master/examples/basic.jl
But I always get this error.
julia> using JuMP, Clp
julia> m = Model(with_optimizer(Clp.Optimizer))
ERROR: UndefVarError: with_optimizer not defined
Stacktrace:
[1] top-level scope at none:0
What am I doing wrong? It seems such a simple example should run quite easily.
You are looking at the example from master branch from the GitHub repository. There are breaking changes in JuMP API since its last release.
You should look at basic.jl file in your local repository. It should be located in a directory location like ~/.julia/packages/JuMP/Xvn0n/examples/basic.jl (the Xvn0n part might be different in your case but the path pattern should be the same; if you are on Windows then ~ is a directory of your user profile).
The example you are referring to looks like this in the released version of the package:
using JuMP, Clp
m = Model(solver = ClpSolver())
#variable(m, 0 <= x <= 2)
#variable(m, 0 <= y <= 30)
#objective(m, Max, 5x + 3y)
#constraint(m, 1x + 5y <= 3.0)
print(m)
status = solve(m)
println("Objective value: ", getobjectivevalue(m))
println("x = ", getvalue(x))
println("y = ", getvalue(y))
You can also find the zipped sources of the latest release here https://github.com/JuliaOpt/JuMP.jl/releases/tag/v0.18.4, but of course as new releases are published the number will change so the most reliable place to look at the codes are examples that JuMP has on your local machine.
Upon trying to calculate precision#k, I get an exception. To what follows is the a simple code that reproduces the problem.
First the code defines the variable scope:
initializer = tf.random_uniform_initializer(-0.1, 0.1, seed=1234)
with tf.variable_scope("model", reuse=None, initializer=initializer)
Then it calls those lines:
predictions = tf.Variable(tf.ones([2, 10], tf.int64))
labels = tf.Variable(tf.ones([2, 1], tf.int64))
precision = tf.contrib.metrics.streaming_sparse_precision_at_k(predictions, labels, 5)
tf.initialize_all_variables().run()
(I know this code is meaningless, and tries to calculate the precision given 2 fixed matrices...)
Then I get the following exception:
W tensorflow/core/framework/op_kernel.cc:936] Failed precondition:
Attempting to use uninitialized value
model/precision_at_5/false_positive_at_5 [[Node:
model/precision_at_5/false_positive_at_5/read = IdentityT=DT_DOUBLE,
_class=["loc:#model/precision_at_5/false_positive_at_5"], _device="/job:localhost/replica:0/task:0/gpu:0"]]
The same goes when I tried to invoke streaming_sparse_recall_at_k instead of streaming_sparse_precision_at_k.
The installed version is r0.10 on linux with python 2.7.
Please help... Thanks in advance :)
Unfortunately, tf.initialize_all_variables() doesn't initialize "local" variables (which tend to be internal implementation details for ops like tf.contrib.metrics.streaming_sparse_precision_at_k() and tf.train.string_input_producer(), as opposed to variables used as model weights).
You'll need to add a line to your program that runs tf.initialize_local_variables() before running the evaluation op:
sess.run(tf.initialize_local_variables()) # or `tf.initialize_local_variables().run()`