I am trying to match the estimate of random effects from R version 3.5.3 (lme4 1.1-18-1) to R version 4.1.1 (lme4 1.1-27.1). However, there is a small difference of random effects between these two versions when there is singular fit. I'm fine with singularity warnings, but it is puzzling that different versions of R/lme4 produce slightly different results.
The following scripts are from R version 3.5.3 (lme4 1.1-18-1) and R version 4.1.1 (lme4 1.1-27.1) with the dataset Arabidopsis from lme4.
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-51.1 compiler_3.5.3 Matrix_1.2-15
[5] tools_3.5.3 Rcpp_1.0.1 splines_3.5.3 nlme_3.1-137
[9] grid_3.5.3 nloptr_1.2.1 lme4_1.1-18-1 lattice_0.20-38
> library(lme4)
Loading required package: Matrix
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
Groups Name Std.Dev.
reg:popu (Intercept) 7.744768797534
reg (Intercept) 10.629179104291
Residual 39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> fit2#theta
[1] 0.150979711638631 0.000000000000000 0.189968995915902
[4] 0.260818869156072
> VarCorr(fit2)
Groups Name Std.Dev.
reg:popu:amd:status (Intercept) 5.841181759473
reg:popu:amd (Intercept) 0.000000000000
reg:popu (Intercept) 7.349619506926
reg (Intercept) 10.090696322743
Residual 38.688521100461
> ##########
> #Example3#
> ##########
> devfun353 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> save.image('myEnvironment353.Rdata')
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-54 compiler_4.1.1 minque_2.0.0 Matrix_1.3-4
[6] tools_4.1.1 Rcpp_1.0.7 tinytex_0.34 splines_4.1.1 nlme_3.1-152
[11] grid_4.1.1 xfun_0.27 nloptr_1.2.2.2 boot_1.3-28 lme4_1.1-27.1
[16] ADDutil_2.2.1.9005 lattice_0.20-44
> library(lme4)
Loading required package: Matrix
Warning message:
package ‘lme4’ was built under R version 4.1.2
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
Groups Name Std.Dev.
reg:popu (Intercept) 7.744768797534
reg (Intercept) 10.629179104291
Residual 39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
boundary (singular) fit: see ?isSingular
> fit2#theta
[1] 0.150979743348540 0.000000000000000 0.189969036985684 0.260818797487214
> VarCorr(fit2)
Groups Name Std.Dev.
reg:popu:amd:status (Intercept) 5.841182965248
reg:popu:amd (Intercept) 0.000000000000
reg:popu (Intercept) 7.349621069388
reg (Intercept) 10.090693513643
Residual 38.688520961140
> ##########
> #Example3#
> ##########
> devfun411 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> load('myEnvironment353.Rdata')
> devfun353 <- lme4:::mkdevfun(environment(devfun353))
> minqa::bobyqa(c(1,1,1,1),devfun353,0,control = list(iprint=2))
npt = 6 , n = 4
rhobeg = 0.2 , rhoend = 2e-07
start par. = 1 1 1 1 fn = 6443.44054431489
rho: 0.020 eval: 11 fn: 6393.61 par: 0.00000 0.621363 0.744867 0.823498
rho: 0.0020 eval: 38 fn: 6361.97 par:0.156855 0.00000 0.190090 0.234676
rho: 0.00020 eval: 49 fn: 6361.94 par:0.150719 0.00000 0.190593 0.249106
rho: 2.0e-05 eval: 67 fn: 6361.94 par:0.150988 0.00000 0.189943 0.260821
rho: 2.0e-06 eval: 74 fn: 6361.94 par:0.150980 0.00000 0.189965 0.260811
rho: 2.0e-07 eval: 82 fn: 6361.94 par:0.150980 0.00000 0.189969 0.260819
At return
eval: 90 fn: 6361.9381 par: 0.150980 0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898
objective: 6361.93810274656
number of function evaluations: 90
> minqa::bobyqa(c(1,1,1,1),devfun411,0,control = list(iprint=2))
npt = 6 , n = 4
rhobeg = 0.2 , rhoend = 2e-07
start par. = 1 1 1 1 fn = 6443.44054431489
rho: 0.020 eval: 11 fn: 6393.61 par: 0.00000 0.621363 0.744867 0.823498
rho: 0.0020 eval: 38 fn: 6361.97 par:0.156855 0.00000 0.190090 0.234676
rho: 0.00020 eval: 49 fn: 6361.94 par:0.150719 0.00000 0.190593 0.249106
rho: 2.0e-05 eval: 67 fn: 6361.94 par:0.150988 0.00000 0.189943 0.260821
rho: 2.0e-06 eval: 74 fn: 6361.94 par:0.150980 0.00000 0.189965 0.260811
rho: 2.0e-07 eval: 82 fn: 6361.94 par:0.150980 0.00000 0.189969 0.260819
At return
eval: 90 fn: 6361.9381 par: 0.150980 0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898
objective: 6361.93810274656
number of function evaluations: 90
When the model is simpler, there is no singularity warning and the results match. (See example 1 in both scripts) When model is relatively complex, there is singularity warning and the results are slightly off (See example 2 in both scripts). The difference is <1e-5 in this case but I have observed <1e-4 before. Can anyone shed some lights on why the results are slightly different? and is it even possible to match the results to at least 1e-8?
Not sure if this is useful but I also extract devfun from 3.5.3 and run it in 4.1.1. The results match. (see example 3) In addition, when I read iteration history from BOBYQA, the $\theta$ of the term that leads to singularity warning oscillates between 0 and small numbers (around 1e-7 to 1e-9).
This post discusses similar topics. It also shows the singularity warning leads to slightly different estimate. There is no obvious change in LME4 NEWS that cause the difference. This FAQ and ?isSingular give great explanation on singularity warning but does not address the issue of mismatching directly.
TL;DR: Sometimes when there is singularity warning (I am ok with), the random effects are slightly different under different R/lme4 versions. Why is this happening and how to address it?
This is a hard problem to solve in general, and even a fairly hard problem to solve in specific cases.
I think the difference arose between version 1.1.27.1 and 1.1.28, probably from this NEWS item:
construction of interacting factors (e.g. when f1:f2 or f1/f2 occur in random effects terms) is now more efficient for partially crossed designs (doesn't try to create all combinations of f1 and f2) (GH #635 and #636)
My guess is that this changes the ordering of the components in the Z matrix, which in turn means that results of various linear algebra operations are not identical (e.g. floating point arithmetic is not associative, so while binary addition is commutative (a + b == b + a), left-to-right evaluation of a sum may not be the same as right-to-left evaluation ((a+b) + c != a + (b+c)) ...)
My attempt at reproducing the problem uses the same version of R ("under development 2022-02-25 r81818") and compares only lme4 package versions 1.18.1 with 1.1.28.9000 (development); any upstream packages such as Rcpp, RcppEigen, Matrix use the same versions. (I had to backport a few changes from the development version of lme4 to 1.1.18.1 to get it to install under the most recent version of R, but I don't think any of those modifications would affect numerical results.)
I did the comparison by installing different versions of the lme4 package before running the code in a fresh R session. My results differed between versions 1.1.18.1 and 1.1.28 less than yours did (both fits were singular, and the relative differences in the theta estimates were of the order of 2e-7 — still greater than your desired 1e-8 tolerance but much smaller than 1e-4 ...)
The results from 1.1.18.1 and 1.1.27.1 were identical.
Q1: Why are your results more different between versions than mine?
in general/anecdotally, numerical results on Windows are slightly more unstable/differ more from other platforms
there are more differences between your two test platforms than among mine: R version, upstream packages (Matrix/Rcpp/RcppEigen/minqa), possibly the compiler versions and settings used to build everything [all of which could make a difference]
Q2: how should one deal with this kind of problem?
as a minor frame challenge, why (other than not understanding what's going on, which is a perfectly legitimate reason to be concerned) does this worry you? The differences in the results are way smaller than the magnitude of statistical uncertainty, and differences this large are also likely to occur across different platforms (OS/compiler version/etc.) even for otherwise identical environments (versions of R, lme4, and other packages).
you could revert to version 1.1.27.1 for now ...
I do take the differences between 1.1.27.1 as a bug, of sorts — at the very least it's an undocumented change in the package. If it were sufficiently high-priority I could investigate the code changes described above and see if there is a way to fix the problems they addressed without breaking backward compatibility (in theory this should be possible, but it could be annoyingly difficult ...)
## R CMD INSTALL ~/R/misc/lme4
library(lme4)
packageVersion("lme4")
## 1.1.18.1
fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
dput(getME(fit2, "theta"))
t1 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072
)
Run under 1.1.28.9000 (fresh R session, re-run package-loading/lmer code above)
## R CMD INSTALL ~/R/pkgs/lme4git/lme4
packageVersion("lme4")
## [1] ‘1.1.28.9000’
dput(getME(fit2, "theta"))
t2 <- c(`reg:popu:amd:status.(Intercept)` = 0.15097974334854, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189969036985684, `reg.(Intercept)` = 0.260818797487214
)
(t1-t2)/((t1+t2)/2)
## reg:popu:amd:status.(Intercept) reg:popu:amd.(Intercept)
## -2.100276e-07 NaN
## reg:popu.(Intercept) reg.(Intercept)
## -2.161920e-07 2.747841e-07
The second element is NaN because both versions give singular fits (0/0 == NaN).
Run under 1.1.27.1 (fresh R session, re-run package-loading/lmer code above)
## remotes::install_version("lme4", "1.1-27.1")
t3 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072)
identical(t1, t3) ## TRUE
I was wondering why R is making a copy-on-modification after using str.
I create a matrix. I can change its dim, one element or even all. No copy is made. But when a call str R is making a copy during the next modification operation on the Matrix. Why is this happening?
m <- matrix(1:12, 3)
tracemem(m)
#[1] "<0x559df861af28>"
dim(m) <- 4:3
m[1,1] <- 0L
m[] <- 12:1
str(m)
# int [1:4, 1:3] 12 11 10 9 8 7 6 5 4 3 ...
dim(m) <- 3:4 #Here after str a copy is made
#tracemem[0x559df861af28 -> 0x559df838e4a8]:
dim(m) <- 3:4
str(m)
# int [1:3, 1:4] 12 11 10 9 8 7 6 5 4 3 ...
dim(m) <- 3:4 #Here again after str a copy
#tracemem[0x559df838e4a8 -> 0x559df82c9d78]:
Also I was wondering why a copy is made when having a Task Callback.
TCB <- addTaskCallback(function(...) TRUE)
m <- matrix(1:12, nrow = 3)
tracemem(m)
#[1] "<0x559dfa79def8>"
dim(m) <- 4:3 #Copy on modification
#tracemem[0x559dfa79def8 -> 0x559dfa8998e8]:
removeTaskCallback(TCB)
#[1] TRUE
dim(m) <- 4:3 #No copy
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)
Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=de_AT.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_AT.UTF-8 LC_COLLATE=de_AT.UTF-8
[5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=de_AT.UTF-8
[7] LC_PAPER=de_AT.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.3
This is a follow up question to Is there a way to prevent copy-on-modify when modifying attributes?.
I start R with R --vanilla to have a clean session.
I have asked this question on R-help as suggested by #sam-mason in the comments.
The answer from Luke Tierney solved the issue with str:
As of R 4.0.0 it is in some cases possible to reduce reference counts
internally and so avoid a copy in cases like this. It would be too
costly to try to detect all cases where a count can be dropped, but it
this case we can do better. It turns out that the internals of
pos.to.env were unnecessarily creating an extra reference to the call
environment (here in a call to exists()). This is fixed in r79528.
Thanks.
And related to Task Callback:
It turns out there were some issues with the way calls to the
callbacks were handled. This has been revised in R-devel in r79541.
This example will no longere need to duplicate in R-devel.
Thanks for the report.
I'm working on a shiny app which plots data trees. I'm looking to incorporate the shinyTree app to permit quick comparison of plotted nodes. The issue is that the shinyTree app returns a redundant list of lists of the sub node plot.
The actual list of list is included below. I would like to keep the longest branches only. I would also like to remove the id node (integer node), I'm struggling as to why it even shows up based on the list. I have tried many different methods to work with this list but it's been a real struggle. The list concept is difficult to understand.
I create the data.tree and plot via:
dataTree.a <- FromListSimple(checkList)
plot(dataTree.a)
> checkList
[[1]]
[[1]]$Asia
[[1]]$Asia$China
[[1]]$Asia$China$Beijing
[[1]]$Asia$China$Beijing$Round
[[1]]$Asia$China$Beijing$Round$`20383994`
[1] 0
[[2]]
[[2]]$Asia
[[2]]$Asia$China
[[2]]$Asia$China$Beijing
[[2]]$Asia$China$Beijing$Round
[1] 0
[[3]]
[[3]]$Asia
[[3]]$Asia$China
[[3]]$Asia$China$Beijing
[1] 0
[[4]]
[[4]]$Asia
[[4]]$Asia$China
[[4]]$Asia$China$Shanghai
[[4]]$Asia$China$Shanghai$Round
[[4]]$Asia$China$Shanghai$Round$`23740778`
[1] 0
[[5]]
[[5]]$Asia
[[5]]$Asia$China
[[5]]$Asia$China$Shanghai
[[5]]$Asia$China$Shanghai$Round
[1] 0
[[6]]
[[6]]$Asia
[[6]]$Asia$China
[[6]]$Asia$China$Shanghai
[1] 0
[[7]]
[[7]]$Asia
[[7]]$Asia$China
[1] 0
[[8]]
[[8]]$Asia
[[8]]$Asia$India
[[8]]$Asia$India$Delhi
[[8]]$Asia$India$Delhi$Round
[[8]]$Asia$India$Delhi$Round$`25703168`
[1] 0
[[9]]
[[9]]$Asia
[[9]]$Asia$India
[[9]]$Asia$India$Delhi
[[9]]$Asia$India$Delhi$Round
[1] 0
[[10]]
[[10]]$Asia
[[10]]$Asia$India
[[10]]$Asia$India$Delhi
[1] 0
[[11]]
[[11]]$Asia
[[11]]$Asia$India
[1] 0
[[12]]
[[12]]$Asia
[[12]]$Asia$Japan
[[12]]$Asia$Japan$Tokyo
[[12]]$Asia$Japan$Tokyo$Round
[[12]]$Asia$Japan$Tokyo$Round$`38001000`
[1] 0
[[13]]
[[13]]$Asia
[[13]]$Asia$Japan
[[13]]$Asia$Japan$Tokyo
[[13]]$Asia$Japan$Tokyo$Round
[1] 0
[[14]]
[[14]]$Asia
[[14]]$Asia$Japan
[[14]]$Asia$Japan$Tokyo
[1] 0
[[15]]
[[15]]$Asia
[[15]]$Asia$Japan
[1] 0
[[16]]
[[16]]$Asia
[1] 0
Well, I did cobble together a poor hack to make this work here is what I did to the 'checkList' list
checkList <- get_selected(tree, format = "slices")
# Convert and collapse shinyTree slices to data.tree
# This is a bit of a cluge to work the graphic with
# shinyTree an alternate one liner is in works
# This transform works by finding the longest branches
# and only plotting them since the other branches are
# subsets due to the slices.
# Extract the checkList name (as characters) from the checkList
tmp <- names(unlist(checkList))
# Determine the length of the individual checkList Names
lens <- lapply(tmp, function(x) length(strsplit(x, ".", fixed=TRUE)[[1]]))
# Find the elements with the highest length returns a list of high vals
lens.max <- which(lens == max(sapply(lens, max)))
# Replace all '.' with '\' prepping for DataFrameTable Converions
tmp <- relist(str_replace_all(tmp, "\\.", "/"), skeleton=tmp)
# Add a root node to work with multiple branches
tmp <- unlist(lapply(tmp, function(x) paste0("Root/", x)))
# Create a list of only the longest branches
longBranches <- as.list(tmp[lens.max])
# Convert the list into a data.frame for convert
longBranches.df <- data.frame(pathString = do.call(rbind, longBranches))
# Publish the data.frame for use
vals$selDF <- longBranches.df
#save(checkList, file = "chkLists.RData") # Save for troubleshooting
print(vals$selDF)ode here
The new checkList looks like this:
[1] "Root/Europe/France/Paris/Round/10843285" "Root/Europe/France/Paris/Round"
[3] "Root/Europe/France/Paris" "Root/Europe/France"
[5] "Root/Europe/Germany/Berlin/Diamond/3563194" "Root/Europe/Germany/Berlin/Diamond"
[7] "Root/Europe/Germany/Berlin/Round/3563194" "Root/Europe/Germany/Berlin/Round"
[9] "Root/Europe/Germany/Berlin" "Root/Europe/Germany"
[11] "Root/Europe/Italy/Rome/Round/3717956" "Root/Europe/Italy/Rome/Round"
[13] "Root/Europe/Italy/Rome" "Root/Europe/Italy"
[15] "Root/Europe/United Kingdom/London/Round/10313307" "Root/Europe/United Kingdom/London/Round"
[17] "Root/Europe/United Kingdom/London" "Root/Europe/United Kingdom"
[19] "Root/Europe"
It works :)... but I think this could be done with a two liner.... I'll work on it again in a week or so. Any other Ideas would be appreciated.
I use an R script for parameter sweeping in a NetLogo model. The script opens the model correctly with NLStart and NLLoadModel. In the sweeping loop, I store the values of input parameters by building a list:
run.params <- list(
RD = NLReport("RD?"),
RD.unif = NLReport("RD-unif?"),
Gini = NLReport("gini"),
Gamma = NLReport("gamma"),
GROUP = NLReport("GROUP?"),
SN = NLReport("SN?"),
Group.size = NLReport("group-size"),
Sn.size = NLReport("sn-size"),
W.group = NLReport("w-group"),
W.sn = NLReport("w-sn"),
Num.sn <- NLReport("num-sn"),
LF <- NLReport("LF?"),
L.memory <- NLReport("L-memory"),
LF.agents <- NLReport("LF-agents?"),
MASS.enthusiasm <- NLReport("MASS-enthusiasm?"),
W.crowd <- NLReport("w-crowd")
)
The results are as follows:
$RD
[1] FALSE
...
$Sn.size
[1] 5
$W.group
[1] 0.05
$W.sn
[1] 0.05
[[11]]
[1] 2
[[12]]
[1] FALSE
[[13]]
[1] 5
[[14]]
[1] FALSE
[[15]]
[1] FALSE
[[16]]
[1] 0.01
i.e after W.group, the values are retrieved correctly (as checked by launching NetLogo outside R) but the names are missing. I don't know why this is happening. I would be grateful for any help on that.
Can anyone help me with the format .. what command should I be using or let me know what should I learn to solve this myslef...)
> d
[1] 5.5
> cat("##",d[])
## 5.5
[1] 5.5
> print("##",d)
[1] "##"
> print("##",d[])
[1] "##"
> print(d)
[1] 5.5
I am looking to match the sample out:
## [1] 5.5
cat(paste("##", capture.output(print(5.5))))
## [1] 5.5
There are probably better alternatives to your actual problem (which you don't define). Package knitr comes to mind.