Mismatching results for singular fit with different R/lme4 versions - r
I am trying to match the estimate of random effects from R version 3.5.3 (lme4 1.1-18-1) to R version 4.1.1 (lme4 1.1-27.1). However, there is a small difference of random effects between these two versions when there is singular fit. I'm fine with singularity warnings, but it is puzzling that different versions of R/lme4 produce slightly different results.
The following scripts are from R version 3.5.3 (lme4 1.1-18-1) and R version 4.1.1 (lme4 1.1-27.1) with the dataset Arabidopsis from lme4.
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-51.1 compiler_3.5.3 Matrix_1.2-15
[5] tools_3.5.3 Rcpp_1.0.1 splines_3.5.3 nlme_3.1-137
[9] grid_3.5.3 nloptr_1.2.1 lme4_1.1-18-1 lattice_0.20-38
> library(lme4)
Loading required package: Matrix
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
Groups Name Std.Dev.
reg:popu (Intercept) 7.744768797534
reg (Intercept) 10.629179104291
Residual 39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> fit2#theta
[1] 0.150979711638631 0.000000000000000 0.189968995915902
[4] 0.260818869156072
> VarCorr(fit2)
Groups Name Std.Dev.
reg:popu:amd:status (Intercept) 5.841181759473
reg:popu:amd (Intercept) 0.000000000000
reg:popu (Intercept) 7.349619506926
reg (Intercept) 10.090696322743
Residual 38.688521100461
> ##########
> #Example3#
> ##########
> devfun353 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> save.image('myEnvironment353.Rdata')
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-54 compiler_4.1.1 minque_2.0.0 Matrix_1.3-4
[6] tools_4.1.1 Rcpp_1.0.7 tinytex_0.34 splines_4.1.1 nlme_3.1-152
[11] grid_4.1.1 xfun_0.27 nloptr_1.2.2.2 boot_1.3-28 lme4_1.1-27.1
[16] ADDutil_2.2.1.9005 lattice_0.20-44
> library(lme4)
Loading required package: Matrix
Warning message:
package ‘lme4’ was built under R version 4.1.2
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
Groups Name Std.Dev.
reg:popu (Intercept) 7.744768797534
reg (Intercept) 10.629179104291
Residual 39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
boundary (singular) fit: see ?isSingular
> fit2#theta
[1] 0.150979743348540 0.000000000000000 0.189969036985684 0.260818797487214
> VarCorr(fit2)
Groups Name Std.Dev.
reg:popu:amd:status (Intercept) 5.841182965248
reg:popu:amd (Intercept) 0.000000000000
reg:popu (Intercept) 7.349621069388
reg (Intercept) 10.090693513643
Residual 38.688520961140
> ##########
> #Example3#
> ##########
> devfun411 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> load('myEnvironment353.Rdata')
> devfun353 <- lme4:::mkdevfun(environment(devfun353))
> minqa::bobyqa(c(1,1,1,1),devfun353,0,control = list(iprint=2))
npt = 6 , n = 4
rhobeg = 0.2 , rhoend = 2e-07
start par. = 1 1 1 1 fn = 6443.44054431489
rho: 0.020 eval: 11 fn: 6393.61 par: 0.00000 0.621363 0.744867 0.823498
rho: 0.0020 eval: 38 fn: 6361.97 par:0.156855 0.00000 0.190090 0.234676
rho: 0.00020 eval: 49 fn: 6361.94 par:0.150719 0.00000 0.190593 0.249106
rho: 2.0e-05 eval: 67 fn: 6361.94 par:0.150988 0.00000 0.189943 0.260821
rho: 2.0e-06 eval: 74 fn: 6361.94 par:0.150980 0.00000 0.189965 0.260811
rho: 2.0e-07 eval: 82 fn: 6361.94 par:0.150980 0.00000 0.189969 0.260819
At return
eval: 90 fn: 6361.9381 par: 0.150980 0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898
objective: 6361.93810274656
number of function evaluations: 90
> minqa::bobyqa(c(1,1,1,1),devfun411,0,control = list(iprint=2))
npt = 6 , n = 4
rhobeg = 0.2 , rhoend = 2e-07
start par. = 1 1 1 1 fn = 6443.44054431489
rho: 0.020 eval: 11 fn: 6393.61 par: 0.00000 0.621363 0.744867 0.823498
rho: 0.0020 eval: 38 fn: 6361.97 par:0.156855 0.00000 0.190090 0.234676
rho: 0.00020 eval: 49 fn: 6361.94 par:0.150719 0.00000 0.190593 0.249106
rho: 2.0e-05 eval: 67 fn: 6361.94 par:0.150988 0.00000 0.189943 0.260821
rho: 2.0e-06 eval: 74 fn: 6361.94 par:0.150980 0.00000 0.189965 0.260811
rho: 2.0e-07 eval: 82 fn: 6361.94 par:0.150980 0.00000 0.189969 0.260819
At return
eval: 90 fn: 6361.9381 par: 0.150980 0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898
objective: 6361.93810274656
number of function evaluations: 90
When the model is simpler, there is no singularity warning and the results match. (See example 1 in both scripts) When model is relatively complex, there is singularity warning and the results are slightly off (See example 2 in both scripts). The difference is <1e-5 in this case but I have observed <1e-4 before. Can anyone shed some lights on why the results are slightly different? and is it even possible to match the results to at least 1e-8?
Not sure if this is useful but I also extract devfun from 3.5.3 and run it in 4.1.1. The results match. (see example 3) In addition, when I read iteration history from BOBYQA, the $\theta$ of the term that leads to singularity warning oscillates between 0 and small numbers (around 1e-7 to 1e-9).
This post discusses similar topics. It also shows the singularity warning leads to slightly different estimate. There is no obvious change in LME4 NEWS that cause the difference. This FAQ and ?isSingular give great explanation on singularity warning but does not address the issue of mismatching directly.
TL;DR: Sometimes when there is singularity warning (I am ok with), the random effects are slightly different under different R/lme4 versions. Why is this happening and how to address it?
This is a hard problem to solve in general, and even a fairly hard problem to solve in specific cases.
I think the difference arose between version 1.1.27.1 and 1.1.28, probably from this NEWS item:
construction of interacting factors (e.g. when f1:f2 or f1/f2 occur in random effects terms) is now more efficient for partially crossed designs (doesn't try to create all combinations of f1 and f2) (GH #635 and #636)
My guess is that this changes the ordering of the components in the Z matrix, which in turn means that results of various linear algebra operations are not identical (e.g. floating point arithmetic is not associative, so while binary addition is commutative (a + b == b + a), left-to-right evaluation of a sum may not be the same as right-to-left evaluation ((a+b) + c != a + (b+c)) ...)
My attempt at reproducing the problem uses the same version of R ("under development 2022-02-25 r81818") and compares only lme4 package versions 1.18.1 with 1.1.28.9000 (development); any upstream packages such as Rcpp, RcppEigen, Matrix use the same versions. (I had to backport a few changes from the development version of lme4 to 1.1.18.1 to get it to install under the most recent version of R, but I don't think any of those modifications would affect numerical results.)
I did the comparison by installing different versions of the lme4 package before running the code in a fresh R session. My results differed between versions 1.1.18.1 and 1.1.28 less than yours did (both fits were singular, and the relative differences in the theta estimates were of the order of 2e-7 — still greater than your desired 1e-8 tolerance but much smaller than 1e-4 ...)
The results from 1.1.18.1 and 1.1.27.1 were identical.
Q1: Why are your results more different between versions than mine?
in general/anecdotally, numerical results on Windows are slightly more unstable/differ more from other platforms
there are more differences between your two test platforms than among mine: R version, upstream packages (Matrix/Rcpp/RcppEigen/minqa), possibly the compiler versions and settings used to build everything [all of which could make a difference]
Q2: how should one deal with this kind of problem?
as a minor frame challenge, why (other than not understanding what's going on, which is a perfectly legitimate reason to be concerned) does this worry you? The differences in the results are way smaller than the magnitude of statistical uncertainty, and differences this large are also likely to occur across different platforms (OS/compiler version/etc.) even for otherwise identical environments (versions of R, lme4, and other packages).
you could revert to version 1.1.27.1 for now ...
I do take the differences between 1.1.27.1 as a bug, of sorts — at the very least it's an undocumented change in the package. If it were sufficiently high-priority I could investigate the code changes described above and see if there is a way to fix the problems they addressed without breaking backward compatibility (in theory this should be possible, but it could be annoyingly difficult ...)
## R CMD INSTALL ~/R/misc/lme4
library(lme4)
packageVersion("lme4")
## 1.1.18.1
fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
dput(getME(fit2, "theta"))
t1 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072
)
Run under 1.1.28.9000 (fresh R session, re-run package-loading/lmer code above)
## R CMD INSTALL ~/R/pkgs/lme4git/lme4
packageVersion("lme4")
## [1] ‘1.1.28.9000’
dput(getME(fit2, "theta"))
t2 <- c(`reg:popu:amd:status.(Intercept)` = 0.15097974334854, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189969036985684, `reg.(Intercept)` = 0.260818797487214
)
(t1-t2)/((t1+t2)/2)
## reg:popu:amd:status.(Intercept) reg:popu:amd.(Intercept)
## -2.100276e-07 NaN
## reg:popu.(Intercept) reg.(Intercept)
## -2.161920e-07 2.747841e-07
The second element is NaN because both versions give singular fits (0/0 == NaN).
Run under 1.1.27.1 (fresh R session, re-run package-loading/lmer code above)
## remotes::install_version("lme4", "1.1-27.1")
t3 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072)
identical(t1, t3) ## TRUE
Related
Basic operations in R giving different results on Windows and Linux
The bounty expires tomorrow. Answers to this question are eligible for a +150 reputation bounty. Rai wants to draw more attention to this question. I have been running some code in R and while testing realized the results were different on Windows and Linux. I have tried to understand why this happens, but couldn't find an answer. Let's illustrate it with an example: These are some hard-coded values for reproducibility, always starting from a clean environment. I have checked that the bit representation of these values is exactly the same in both the Windows and the Linux machines: data <- structure(list(x = c(0.1, 0.1, 0.1, 5, 5, 5, 10, 10, 10, 20, 20, 20), y = c(0.013624804, 0.014023006, 0.013169554, 0.70540352, 0.68711807, 0.69233506, 1.4235181, 1.348244, 1.4141854, 2.779813, 2.7567347, 2.7436437)), class = c("data.frame"), row.names = c(NA, 12L)) val <- c(43.3065849160736, 0.00134925463859564, 1.03218302435548, 270.328323775978) theta <- 1.60812569803848 init <- c(b0 = 2.76836653333333, b1 = 0.0134350095, b2 = 2.15105945932773, b3 = 6.85922519794374) Now I define a new variable W which is again exactly the same in bit representation in Windows and Linux: f <- function(X, b0, b1, b2, b3) { b0 + (b1 - b0) / (1 + exp(b2*(log(X) - log(b3)))) } W <- 1 / f(data$x, val[1], val[2], val[3], val[4])^theta And finally I apply an optim function: SSw <- function(Y, X, b0, b1, b2, b3, w) { sum(w * (Y - f(X, b0, b1, b2, b3))^2) } SSw.vec <- function(par) SSw(data$y, data$x, par[1], par[2], par[3], par[4], W) mod <- optim(init, SSw.vec, method = "L-BFGS-B", lower = c(-Inf,-Inf,-Inf,0)) print(mod$par) # In Windows it returns: # b0 b1 b2 b3 # 3.097283e+01 1.831543e-03 1.047613e+00 1.842448e+02 # In Linux it returns: # b0 b1 b2 b3 # 3.459241e+01 1.530134e-03 1.040363e+00 2.101996e+02 As you can see the differences are quite significative, but even if they weren't... just why are there any differences? Any help will be really appreciated! Edit Here I add the sessionInfo() on both Windows and Linux. On Windows: R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] Rcpp_1.0.8.3 plyr_1.8.6 cellranger_1.1.0 compiler_3.6.3 pillar_1.7.0 nloptr_1.2.2.2 tools_3.6.3 [8] bit_4.0.4 boot_1.3-24 lme4_1.1-29 lifecycle_1.0.0 tibble_3.1.7 nlme_3.1-144 gtable_0.3.0 [15] lattice_0.20-38 pkgconfig_2.0.3 rlang_1.0.2 Matrix_1.2-18 cli_3.4.1 rstudioapi_0.11 dplyr_1.0.6 [22] generics_0.1.0 vctrs_0.3.8 lmerTest_3.1-3 grid_3.6.3 tidyselect_1.1.1 glue_1.4.2 R6_2.4.1 [29] fansi_0.4.1 readxl_1.3.1 minqa_1.2.4 ggplot2_3.3.6 purrr_0.3.5 magrittr_1.5 scales_1.1.1 [36] ellipsis_0.3.2 MASS_7.3-51.5 splines_3.6.3 colorspace_1.4-1 numDeriv_2016.8-1.1 renv_0.13.2 utf8_1.1.4 [43] munsell_0.5.0 crayon_1.3.4 On Linux: R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /opt/r/lib/R/lib/libRblas.so LAPACK: /opt/r/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 [4] LC_COLLATE=C LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] compiler_3.6.3 tools_3.6.3 renv_0.13.2
I ran your code on my local machine with Fedora 37 and R 4.2.2. As the other commenters, I also got the result you got on Windows. Then I pulled the rocker image for R version 3.6.3: docker run -ti rocker/r-ver:3.6.3 R This image is also Debian-based. Here I was able to recreate the result you got on your system. Then I moved to the rocker versioned release 4.0.0: docker run -ti rocker/r-ver:4.0.0 R Here the result was the same as you got on Windows and everyone else got on their machine. It must be noted that with R 4.0.0 the rocker project moved from Debian-based images to Ubuntu LTS. Fedora comes with the ability to easily switch the BLAS/LAPACK backend via the flexiblas package. Thanks to that, I was able to test your code with the eight different backends available on my system. As you can see below, they do yield different results. In particular, the ATLAS backend comes somewhat close to the result you got. In contrast, OPENBLAS-OPENMP (the default on Fedora), other OPENBLAS variants, and NETLIB all produce the same result as you received on Windows. A third family BLIS produces yet another set of possible results. Is one of the results better than the others? Yes! optim() looks for a result that minimizes the supplied function. In its returned list, it reports not just the minimizing parameters, but also the value for them. I've included that in the table below. So the ATLAS backend wins the prize here. It must be said that optim() does NOT minimize analytically. So it always gives approximate results. That is why initial values and the method matter for what results we get. And apparently, with your function the backend also matters. And if you look at the parameters you got on Buster the function goes to 0.002800321. So it is actually a better result than what we all get on our more modern systems, except for the result I got with ATLAS. That also happens to be much slower than the other backends. So it seems, the newer backends might have traded speed for accuracy. If your aim is consistency across platforms, you can upgrade your system to Debian 11 Bullseye, since that appears to have a backend producing the same results as other modern platforms, as the answer by #jay.sf indicates. You could also investigate if you can find the same BLAS backend version used on Buster for Windows. Furthermore, you can try to change to another blas library on your current system. Here is a guide how to do that. Though it is for Ubuntu, as both use apt, it should work for your system as well. (Edit: I tried that in a VM for Buster. None of the available BLAS backends produced the same result as on the more modern systems) Finally, if you feel you must have a newer BLAS library on your older system, then you could try to backport it yourself. I have no experience with this. I don't know how advisable it is or how likely to succeed. I am just mentioning it for completeness. library(flexiblas) library(tidyverse) test_fun <- function(i) { flexiblas_switch(i) data <- structure(list(x = c(0.1, 0.1, 0.1, 5, 5, 5, 10, 10, 10, 20, 20, 20), y = c(0.013624804, 0.014023006, 0.013169554, 0.70540352, 0.68711807, 0.69233506, 1.4235181, 1.348244, 1.4141854, 2.779813, 2.7567347, 2.7436437)), class = c("data.frame"), row.names = c(NA, 12L)) val <- c(43.3065849160736, 0.00134925463859564, 1.03218302435548, 270.328323775978) theta <- 1.60812569803848 init <- c(b0 = 2.76836653333333, b1 = 0.0134350095, b2 = 2.15105945932773, b3 = 6.85922519794374) f <- function(X, b0, b1, b2, b3) { b0 + (b1 - b0) / (1 + exp(b2*(log(X) - log(b3)))) } W <- 1 / f(data$x, val[1], val[2], val[3], val[4])^theta SSw <- function(Y, X, b0, b1, b2, b3, w) { sum(w * (Y - f(X, b0, b1, b2, b3))^2) } SSw.vec <- function(par) SSw(data$y, data$x, par[1], par[2], par[3], par[4], W) mod <- optim(init, SSw.vec, method = "L-BFGS-B", lower = c(-Inf,-Inf,-Inf,0)) return(c(mod$par, value = mod$value)) } flexiblas_list() |> setdiff("__FALLBACK__") |> tibble(backend = _) |> mutate( idx = flexiblas_load_backend(backend), res = map(idx, test_fun) ) |> unnest_wider(res) #> # A tibble: 8 × 7 #> backend idx b0 b1 b2 b3 value #> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 NETLIB 2 31.0 0.00183 1.05 184. 0.00282 #> 2 OPENBLAS-OPENMP 1 31.0 0.00183 1.05 184. 0.00282 #> 3 ATLAS 3 34.7 0.00168 1.04 209. 0.00280 #> 4 BLIS-SERIAL 4 27.1 0.00225 1.06 158. 0.00285 #> 5 BLIS-OPENMP 5 27.1 0.00225 1.06 158. 0.00285 #> 6 BLIS-THREADS 6 27.1 0.00225 1.06 158. 0.00285 #> 7 OPENBLAS-SERIAL 7 31.0 0.00183 1.05 184. 0.00282 #> 8 OPENBLAS-THREADS 8 31.0 0.00183 1.05 184. 0.00282
I now can confirm your issue. I installed Debian Buster on a VM, did apt install r-base and got R3.5.2, ran your code, and it showed the same (probably) "flawed" Linux results from OP. However, then I updated to R.4.2.2 but the "flawed" results didn't change! It used libblas3.8. On my real machine I'm running Ubuntu, R4.2.2, libblas3.10 and get the (probably) "right" Windows results. Unfortunately I was not able to install libblas3.10 on Debian Buster and I am not sure if that's possible at all (see it's not listed under Buster in the link). Notice, that Debian Bullseye is now the actual version and your system is actually outdated. The outdated BLAS/LAPACK may ― as noted in my comment — still be considered to produce the erroneous results, since these are the actual algebraic engines. You may be able to install an updated BLAS/LAPACK on Debian Buster, but I tend to recommend you upgrade to Debian Bullseye.
This is a little bit tangential, but probably the easiest thing you can do to alleviate the between-platform differences is to use control = list(parscale = abs(init)) in your optim() call. The reason for this is that unless a gradient function is specified, L-BFGS-B automatically uses finite differences with a fixed stepsize (ndeps, defaulting to 1e-3 for all parameters) to approximate the gradient. This is usually good enough but can cause problems for hard/unstable optimization problems, or when the parameters are on very different scales. parscale as specified above tells optim how to scale the parameters internally, generally improving the results. It might be even better to pass an analytic (or auto-differentiated) gradient, but that's more work ...
This is also a little bit tangential but when I run your code on my system I get in the output of optim convergence=1, what indicates that the iteration limit ‘maxit’ had been reached. 0 indicates successful completion so maxit should be inceased. mod <- optim(init, SSw.vec, method = "L-BFGS-B", lower = c(-Inf,-Inf,-Inf,0)) mod$convergence #[1] 1 mod$message #[1] "NEW_X" mod <- optim(init, SSw.vec, method = "L-BFGS-B", lower = c(-Inf,-Inf,-Inf,0), control=list(maxit=1e5)) mod$convergence #[1] 0 mod$message #[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" mod$par # b0 b1 b2 b3 #4.336406e+01 1.154007e-03 1.031650e+00 2.710800e+02 mod$value #[1] 0.002779518 Maybe this helps to get similar results on different system constellations.
Why is R making a copy-on-modification after using str?
I was wondering why R is making a copy-on-modification after using str. I create a matrix. I can change its dim, one element or even all. No copy is made. But when a call str R is making a copy during the next modification operation on the Matrix. Why is this happening? m <- matrix(1:12, 3) tracemem(m) #[1] "<0x559df861af28>" dim(m) <- 4:3 m[1,1] <- 0L m[] <- 12:1 str(m) # int [1:4, 1:3] 12 11 10 9 8 7 6 5 4 3 ... dim(m) <- 3:4 #Here after str a copy is made #tracemem[0x559df861af28 -> 0x559df838e4a8]: dim(m) <- 3:4 str(m) # int [1:3, 1:4] 12 11 10 9 8 7 6 5 4 3 ... dim(m) <- 3:4 #Here again after str a copy #tracemem[0x559df838e4a8 -> 0x559df82c9d78]: Also I was wondering why a copy is made when having a Task Callback. TCB <- addTaskCallback(function(...) TRUE) m <- matrix(1:12, nrow = 3) tracemem(m) #[1] "<0x559dfa79def8>" dim(m) <- 4:3 #Copy on modification #tracemem[0x559dfa79def8 -> 0x559dfa8998e8]: removeTaskCallback(TCB) #[1] TRUE dim(m) <- 4:3 #No copy sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/local/lib/R/lib/libRblas.so LAPACK: /usr/local/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=de_AT.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_AT.UTF-8 LC_COLLATE=de_AT.UTF-8 [5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=de_AT.UTF-8 [7] LC_PAPER=de_AT.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 This is a follow up question to Is there a way to prevent copy-on-modify when modifying attributes?. I start R with R --vanilla to have a clean session.
I have asked this question on R-help as suggested by #sam-mason in the comments. The answer from Luke Tierney solved the issue with str: As of R 4.0.0 it is in some cases possible to reduce reference counts internally and so avoid a copy in cases like this. It would be too costly to try to detect all cases where a count can be dropped, but it this case we can do better. It turns out that the internals of pos.to.env were unnecessarily creating an extra reference to the call environment (here in a call to exists()). This is fixed in r79528. Thanks. And related to Task Callback: It turns out there were some issues with the way calls to the callbacks were handled. This has been revised in R-devel in r79541. This example will no longere need to duplicate in R-devel. Thanks for the report.
R using lmer gives: Error in diag(vcov(object, use.hessian = use.hessian))
There is a strange behaviour when I use lmer: when I save the fit using lmer into an object, let's say fit0, using lmer, I can look at the summary (output not showing): >summary(fit0) If I save the objects using save.image(), close the session and reopen it again, summary gives me: >summary(fit0) Error in diag(vcov(object, use.hessian = use.hessian)) error in evaluating the argument 'x' in selecting a method for function 'diag': Error in object#pp$unsc() : object 'merPredDunsc' not found If I run again the model, I get the expected summary but will loose it if I close the session. What happens? How can I avoid this Error? Thanks for help. Environment and version: Windows 7 R version 3.1.2 (2014-10-31) GNU Emacs 24.3.1 (i386-mingw-nt6.1.7601)/ESS Here is a minimal example: # j: cluster # i[j]: i in cluster j # yi[j] = zi[j] + N(0,1) # zi[j] = b0j + b1*xi[j] # b0j = g0 + u0j, u0j ~ N(0,sd0) # b1 = const library(lme4) # Number of clusters (level 2) N <- 20 # intercept g0 <- 1 sd0 <- 2 # slope b1 <- 3 # Number of observations (level 1) for cluster j nj <- 10 # Vector of clusters indices 1,1...n1,2,2,2,....n2,...N,N,....nN j <- c(sapply(1:N, function(x) rep(x, nj))) # Vector of random variable uj <- c(sapply(1:N, function(x)rep(rnorm(1,0,sd0), nj))) # Vector of fixed variable x1 <- rep(runif(nj),N) # linear combination z <- g0 + uj + b1*x1 # add error y <- z + rnorm(N*nj,0,1) # Put all together d0 <- data.frame(j, y=y, z=z,x1=x1, uj=uj) head(d0) # mixed model fit0 <- lmer(y ~ x1 + (1|j), data = d0) vcov(fit0) summary(fit0) save.image() After restarting und adding library lme4: > sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lme4_1.1-7 Rcpp_0.11.0 Matrix_1.1-2-2 loaded via a namespace (and not attached): [1] compiler_3.1.2 grid_3.1.2 lattice_0.20-29 MASS_7.3-35 [5] minqa_1.2.3 nlme_3.1-118 nloptr_1.0.4 splines_3.1.2 [9] tools_3.1.2 >
how to apply long only add.distribution parameterset in quantStrat - simpleError in param.combo[[param.label]]
I am applying similar add.distribution rule as in the luxor-demo while my strategy has only a long position. The whole strategy works, but when applying a parameterset I get following error: TakeProfitLONG 47 0.047 TakeProfitLONG 47 0.047 result of evaluating expression: simpleError in param.combo[[param.label]]: subscript out of bounds got results for task 47 numValues: 47, numResults: 47, stopped: FALSE returning status FALSE evaluation # 48: $param.combo I am trying to run a distribution on a simple takeProfit rule (get same result from stopLoss or trailingStop): .use.takeProfit = TRUE .takeprofit <- 2.0/100 # actual .TakeProfit = seq(0.1, 4.8, length.out=48)/100 # parameter set for optimization ## take-profit add.rule(strategy.st, name = 'ruleSignal', arguments=list(sigcol='signal.gt.zero' , sigval=TRUE, replace=FALSE, orderside='long', ordertype='limit', tmult=TRUE, threshold=quote(.takeprofit), TxnFees=.txnfees, orderqty='all', orderset='ocolong' ), type='chain', parent='EnterLONG', label='TakeProfitLONG', enabled=.use.takeProfit ) I am adding the distribution as follows: add.distribution(strategy.st, paramset.label = 'TakeProfit', component.type = 'chain', component.label = 'TakeProfitLONG', variable = list(threshold = .TakeProfit), label = 'TakeProfitLONG' ) and apply the set: results <- apply.paramset(strategy.st, paramset.label='TakeProfit', portfolio.st=portfolio.st, account.st=account.st, nsamples=.nsamples, verbose=TRUE) From my limited debugging it seems that the parameterset is a simple vector whereas in the apply.paramset following function fails: results <- fe %dopar% { ... } Here I am too new to R as i am only 4 weeks looking into this, but possibly a call to: install.param.combo <- function(strategy, param.combo, paramset.label) might cause the error? Have to apologize as I am to new, but did anyone encounter this or could help how to apply a distribution to only one item in a long only strategy? Many thanks in advance! EDIT 1: SessionInfo() R version 3.1.2 (2014-10-31) Platform: i486-pc-linux-gnu (32-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.20-29 iterators_1.0.7 downloader_0.3 [4] quantstrat_0.9.1665 foreach_1.4.2 blotter_0.9.1644 [7] PerformanceAnalytics_1.4.3574 FinancialInstrument_1.2.0 quantmod_0.4-3 [11] TTR_0.22-0.1 xts_0.9-7 zoo_1.7-12 loaded via a namespace (and not attached): [1] codetools_0.2-9 compiler_3.1.2 digest_0.6.7 grid_3.1.2 tools_3.1.2
This is the same bug as # 5776. It was fixed for "signal" component types, but not for "chain". It should now be fixed as of revision 1669 on R-Forge.
Error using Apply Function in R on Tutorial Example
I am trying to learn about how to use the apply function and I came across this tutorial: http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/ which seems clear and concise, but I'm running into a problem right away. The very first example they give to demonstrate apply is: > # create a matrix of 10 rows x 2 columns > m <- matrix(c(1:10, 11:20), nrow = 10, ncol = 2) > # mean of the rows > apply(m, 1, mean) [1] 6 7 8 9 10 11 12 13 14 15 This seems very basic, but I thought I'd give it a try. Here is my result: > # create a matrix of 10 rows x 2 columns > m <- matrix(c(1:10, 11:20), nrow = 10, ncol = 2) > # mean of the rows > apply(m, 1, mean) Error in FUN(newX[, i], ...) : unused argument(s) (newX[, i]) Needless to say, I'm lost on this one... To provide some more information, I attempted another example provided in the tutorial and got the correct result. The difference in this case was that the function was specifically stated in the apply function: apply(m, 1:2, function(x) x/2) [,1] [,2] [1,] 0.5 5.5 [2,] 1.0 6.0 [3,] 1.5 6.5 [4,] 2.0 7.0 [5,] 2.5 7.5 [6,] 3.0 8.0 [7,] 3.5 8.5 [8,] 4.0 9.0 [9,] 4.5 9.5 [10,] 5.0 10.0 sessionInfo() output is below: R version 2.15.3 (2013-03-01) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.15.3 And the output for conflicts(details = TRUE) $.GlobalEnv [1] "edit" "mean" $`package:utils` [1] "edit" $`package:methods` [1] "body<-" "kronecker" $`package:base` [1] "body<-" "kronecker" "mean"
As others have identified, it's probably because you have a conflict on mean. When you call anything (functions, objects), R goes through the search path until it's found (and if it isn't found R will complain accordingly): > search() [1] ".GlobalEnv" "tools:RGUI" "package:stats" [4] "package:graphics" "package:grDevices" "package:utils" [7] "package:datasets" "package:methods" "Autoloads" [10] "package:base" If you're fairly new to R, note that when you create a function, unless you specify otherwise, it's usually going to live in ".GlobalEnv". R looks there first before going any further, so it's fairly important to name your functions wisely, so as not to conflict with common functions (e.g. mean, plot, summary). It's probably a good idea to start with a clean session once in a while. It's fairly common in the debugging phase to name variables x or y (names picked for convenience rather than informativeness... we're only human after all), which can be unexpectedly problematic down the line. When you have a workspace that's fairly crowded, the probability of conflicts increases, so (a) pick names carefully and (b) restart without restoring would be my advice.