glmmPQL crashes on inclusion of corSpatial object - r

Link to data (1170 obs, 9 variables, .Rd file)
Simply read it in using readRDS(file).
I´m trying to setup a GLMM using the glmmPQL function from the MASS package including a random effects part and accounting for spatial autocorrelation. However, R (Version: 3.3.1) crashes upon execution.
library(nlme)
# setup model formula
fo <- hail ~ prec_nov_apr + t_min_nov_apr + srad_nov_apr + age
# setup corSpatial object
correl = corSpatial(value = c(10000, 0.1), form = ~ry + rx, nugget = TRUE,
fixed = FALSE, type = "exponential")
correl = Initialize(correl, data = d)
# fit model
fit5 <- glmmPQL(fo, random = ~1 | date, data = d,
correl = correl, family = binomial)
What I tried so far:
reduce number of observation
play with corSpatial parameters (range and nugget)
reduce number of fixed predictors
execute code on Windows, Linux (Debian) and Mac R installations
While I get no error message on my local pc (RStudio just crashes), running the script on a server returns the following error message:
R: malloc.c:3540: _int_malloc: Assertion (fwd->size & 0x4) == 0' failed. Aborted

I'd use the INLA package to model this, as it allows to use spatially correlated random effects. The required code is a bit too long to place here. Therefore I've place it in a document on http://rpubs.com/INBOstats/spde

Related

readRDS error from reading a R object of ~ 160MB

I’m trying to save a R object, which is a linear regression model based on ridge regression using R package glmnet. I'm using saveRDS to save it and it runs without error.
saveRDS(ridge, file = 'rnaClassifer_ridgeReg.RDdata')
HOWEVER, I cannot load the object back to R studio via readRDS, and it keeps giving errors and crashes the R session.
readRDS('rnaClassifer_ridgeReg.RDdata')
Note here this is a R object with size of 161MB after saving as rnaClassifer_ridgeReg.RDdata (which can be downloaded from here). My local laptop has 8 cores / 32 GB, which I would think is enough?
Here I'm also attaching the dataset (here) used to build the regression model, along with the code. Feel free to run the commands below to generate the R object ridge, and see if you can save it and load it successfully back to R.
library (caret)
library (glmnet)
data.lm.train = read.table('data.txt.gz', header = T, sep = '\t', quote = '', check.names = F)
lambda <- 10^seq(-3, 3, length = 100)
### ridge regression
set.seed(666)
ridge <- train(
dnaScore ~., data = data.lm.train, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0, lambda = lambda)
)
Any help would be highly appreciated!

ggcoef_model error when two random intercepts

When trying to graph the conditional fixed effects of a glmmTMB model with two random intercepts in GGally I get the error:
There was an error calling "tidy_fun()". Most likely, this is because the
function supplied in "tidy_fun=" was misspelled, does not exist, is not
compatible with your object, or was missing necessary arguments (e.g. "conf.level=" or "conf.int="). See error message below.
Error: Error in "stop_vctrs()":
! Can't recycle "..1" (size 3) to match "..2" (size 2).`
I have tinkered with figuring out the issue and it seems to be related to the two random intercepts included in the model. I have also tried extracting the coefficient and standard error information separately through broom.mixed::tidy and then feeding the data frame into GGally:ggcoef() with no avail. Any suggestions?
# Example with built-in randu data set
data(randu)
randu$A <- factor(rep(c(1,2), 200))
randu$B <- factor(rep(c(1,2,3,4), 100))
# Model
test <- glmmTMB(y ~ x + z + (0 +x|A) + (1|B), family="gaussian", data=randu)
# A few of my attempts at graphing--works fine when only one random effects term is in model
ggcoef_model(test)
ggcoef_model(test, tidy_fun = broom.mixed::tidy)
ggcoef_model(test, tidy_fun = broom.mixed::tidy, conf.int = T, intercept=F)
ggcoef_model(test, tidy_fun = broom.mixed::tidy(test, effects="fixed", component = "cond", conf.int = TRUE))
There are some (old!) bugs that have recently been fixed (here, here) that would make confidence interval reporting on RE parameters break for any model with multiple random terms (I think). I believe that if you are able to install updated versions of both glmmTMB and broom.mixed:
remotes::install_github("glmmTMB/glmmTMB/glmmTMB#ci_tweaks")
remotes::install_github("bbolker/broom.mixed")
then ggcoef_model(test) will work.

Error in lme4::allFit() -- no applicable method for 'isGLMM'

I'm hitting a confusing error while trying to run the lme4::allFit() using some built-in parallelization. I fit an initial model m0, which uses a larger dataframe ckDF (n = 265,623 rows) to model a binary response to a number of categorical and continuous predictors in a logistic framework with a random intercept for year.
I'm interested in determining whether different optimizers yield different results, following some recommendations I've found online (e.g. by #BenBolker here). My data is fairly large and takes ~20 minutes to run usually, so I'm hoping to use the parallel and ncpus parameters of allFit() to speed it up a bit. Here's my relevant code:
require(lme4)
require(parallel)
m0 <- glmer(returned ~ 1 + barge + site + barge:site +
(run + rearType + basin)^2 +
(tdg + temp + holdingTime)^2 +
(1|year),
data = ckDF, family = 'binomial',
control = glmerControl(optimizer='bobyqa',
optCtrl = list(maxfun = 1e5)))
af1 <- allFit(m0, parallel = 'multicore', ncpus = detectCores())
Upon doing this, I encounter the following error:
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: no applicable method for 'isGLMM' applied to an object of class "list"
Any ideas? It seems to me that when it constructs a bunch of nodes, somehow some of them don't import the lme4 package and thus do not recognize isGLMM(); but I don't know why allFit() would do this, since it's from lme4(). I tried looking under the hood and altering the function for my own allFit() package, but ran into other errors.
Any help would be appreciated. R Version: 3.6.1; lme4 Version: 1.1-21; platform: Windows 10 64-bit
Thanks to #user20650 & #Ben Bolker for the tips in comments above -- it worked and I was able to get allFit() to run as expected, by ensuring I use parallel = "snow" in my function call since I'm running in Windows. Just posting the edited code here for anyone else who finds this useful:
require(lme4); require(snow)
# Define initial model (switched to defaults here)
m0 <- glmer(returned ~ 1 + barge + site + barge:site +
(run + rearType + basin)^2 +
(tdg + temp + holdingTime)^2 +
(1|year),
data = ckDF, family = 'binomial')
# Set up cluster for running allFit()
optCls <- makeCluster(detectCores()-1, type = "SOCK")
clusterEvalQ(optCls,library("lme4"))
clusterExport(optCls, "ckDF")
# Use allFit() to look at differences in optimizers
system.time(af1 <- allFit(m0, parallel = 'snow',
ncpus = detectCores()-1, cl=optCls))
stopCluster(optCls)
Ended up taking ~40 minutes using 11 cores on my machine.

Extracting path coefficients of piecewise SEM (structural equation model)

I'm constructing a piecewise structural equation model using the piecewiseSEM package in R (Lefcheck - https://cran.r-project.org/web/packages/piecewiseSEM/vignettes/piecewiseSEM.html)
I already created the model set and I could evaluate the model fit, so the model itself works. Also, the data fits the model (p = 0.528).
But I do not succeed in extracting the path coefficients.
This is the error i get: Error in cbind(Xlarge, Xsmall) : number of rows of matrices must match (see arg 2)
I already tried (but this did not work):
standardising my data because of the warning: Some predictor variables are on very different scales: consider rescaling
adapted my data (threw some NA values away)
This is my modellist:
predatielijst = list(
lmer(plantgrootte ~ gapfraction + olsen_P + (1|plot_ID), data = d),
glmer(piek1 ~ gapfraction + olsen_P + plantgrootte + (1|plot_ID),
family = poisson, data = d),
glmer(predatie ~ piek1 + (1|plot_ID), family = binomial, data = d)
)
with "predatie" being a binary variable (yes or no) and all the rest continuous variables (gapfraction, plantgrootte, olsen_P & piek1)
Thanks in advance!
Try installing the development version:
library(devtools)
install_github("jslefche/piecewiseSEM#2.0")
Replace list with psem and run the coefs or summary function. It will likely get rid of your error. If not, open a bug on Github!
WARNING: this will overwrite your current version from CRAN. You will need to reinstall from CRAN to get version 1.4 back.
try to use lme (out of the nlme library) ilstead of glmer. As far as I understand, the fact that lmer does not provide p-values (while lme does) seems to be the problem here.
Hope this works.

IV Estimation with Cluster Robust Standard Errors using the plm package in R

I'm using the plm package for panel data to do instrumental variable estimation. However, it seems that calculating cluster robust standard errors by using the vcovHC() function is not supported.
More specifically, when I use the vcovHC() function, the following error message is displayed:
Error in vcovG.plm(x, type = type, cluster = cluster, l = 0, inner = >inner, :
Method not available for IV
Example:
data("Wages", package = "plm")
IV <- plm(lwage ~ south + exp | wks + south,
data = Wages, model = "pooling", index = 595)
vcvIV <- vcovHC(IV)
According to this thread, someone worked on a fix two years ago. Is there any progress on the issue? I know that the packages "lfe" and "ivpack" allow to compute cluster robust standard errors for IV estimation but none of them allows for random effects/intercepts.
In fact it's not implemented. However, you can use Schrimpf's clustered errors function which is applied directly to a object of the plm class.
Using your example:
library (plm)
data("Wages", package = "plm")
IV <- plm(lwage ~ south + exp | wks + south, data = Wages, model = "pooling", index = 595)
Wages$id <- rep(1:595, each = 7)
cl.plm(Wages, IV, Wages$id)
Where I'm using Wages$idas the panel first dimension around which clusters will be formed. You may want to compare these results with the obtained in other software. Anyway, the code is simple allowing some tricks. The cl.plm function is based on Arai's clustering notes which can help you further.
You can obtain the same result from cl.plm doing this in Stata:
ivregress 2sls lwage south (exp = wks), vce(cluster id) small
Or for the within model:
xtset id time, generic
xtivreg2 lwage south (exp = wks), fe small cluster(id)
Note however I used the small sample formulation in Stata, which is not big deal. More about this here. Anyway, cl.plm properly deals with the plm class object.
For sake of completeness: as suggested by #Helix123, you can use the development version (1.6-1) of plm package and proceed as you did in tour question.

Resources