readRDS error from reading a R object of ~ 160MB - r

I’m trying to save a R object, which is a linear regression model based on ridge regression using R package glmnet. I'm using saveRDS to save it and it runs without error.
saveRDS(ridge, file = 'rnaClassifer_ridgeReg.RDdata')
HOWEVER, I cannot load the object back to R studio via readRDS, and it keeps giving errors and crashes the R session.
readRDS('rnaClassifer_ridgeReg.RDdata')
Note here this is a R object with size of 161MB after saving as rnaClassifer_ridgeReg.RDdata (which can be downloaded from here). My local laptop has 8 cores / 32 GB, which I would think is enough?
Here I'm also attaching the dataset (here) used to build the regression model, along with the code. Feel free to run the commands below to generate the R object ridge, and see if you can save it and load it successfully back to R.
library (caret)
library (glmnet)
data.lm.train = read.table('data.txt.gz', header = T, sep = '\t', quote = '', check.names = F)
lambda <- 10^seq(-3, 3, length = 100)
### ridge regression
set.seed(666)
ridge <- train(
dnaScore ~., data = data.lm.train, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0, lambda = lambda)
)
Any help would be highly appreciated!

Related

Parallel estimation of multiple nonparametric models using np and snowfall

I am trying to estimate multiple nonparametric models using snowfall. So far I had no problems, but now I run into a problem that I feel unable to resolve.
In the MWE below we simply estimate only one model on one node. In my application the structure is the same. When I try to plot the model results or use another function from the np package (like npsigtest()), I get the error
Error in is.data.frame(data) : ..1 used in an incorrect context, no
... to look in
Has anyone an idea what causes the problem? I am open to another approach concerning parallel estimation of several models.
MRE:
library(np)
library(snowfall)
df <- data.frame(Y = runif(100, 0, 10), X = rnorm(100))
models <- list(as.formula(Y ~ X))
sfInit(parallel = T, cpus = length(models))
sfExport("models")
sfExport("df")
sfLibrary(snowfall)
sfLibrary(np)
lcls <- sfLapply(models, fun = npregbw, data = df, regtype = "lc")
sfStop()
plot(lcls[[1]])

How to reproduce topic modelling results with LDA package in R

I am using the lda package in R to perform Latent Dirichlet Allocation modelling. However, each time I run the program I get a different output.
Using set.seed() doesn't seem to help like with the topicmodels package.
Assuming an identical input, is there a way to ensure that identical topics are found on subsequent executions of the code?
I execute the function as follows:
set.seed(11)
fit1 <- lda.collapsed.gibbs.sampler(documents = documents, K = topics, vocab = vocab,
num.iterations = iterations, alpha = alpha,
eta = eta, initial = NULL, burnin = 500,
compute.log.likelihood = TRUE)

glmmPQL crashes on inclusion of corSpatial object

Link to data (1170 obs, 9 variables, .Rd file)
Simply read it in using readRDS(file).
I´m trying to setup a GLMM using the glmmPQL function from the MASS package including a random effects part and accounting for spatial autocorrelation. However, R (Version: 3.3.1) crashes upon execution.
library(nlme)
# setup model formula
fo <- hail ~ prec_nov_apr + t_min_nov_apr + srad_nov_apr + age
# setup corSpatial object
correl = corSpatial(value = c(10000, 0.1), form = ~ry + rx, nugget = TRUE,
fixed = FALSE, type = "exponential")
correl = Initialize(correl, data = d)
# fit model
fit5 <- glmmPQL(fo, random = ~1 | date, data = d,
correl = correl, family = binomial)
What I tried so far:
reduce number of observation
play with corSpatial parameters (range and nugget)
reduce number of fixed predictors
execute code on Windows, Linux (Debian) and Mac R installations
While I get no error message on my local pc (RStudio just crashes), running the script on a server returns the following error message:
R: malloc.c:3540: _int_malloc: Assertion (fwd->size & 0x4) == 0' failed. Aborted
I'd use the INLA package to model this, as it allows to use spatially correlated random effects. The required code is a bit too long to place here. Therefore I've place it in a document on http://rpubs.com/INBOstats/spde

Fatal error with train() in caret on Windows 7, R 3.0.2, caret 6.0-21

I am trying to use train() in caret to fit a classification model, but I'm hitting some kind of unhandled exception and my R session crashes before outputting any error information in the R console.
Windows error:
R for Windows terminal front-end has stopped working
I am running Windows 7, R 3.0.2, caret 6.0-21, and have tried running this on both 32/64 versions of R, in R Studio and also directly in the R console, and am getting the same results each time.
Here is my call to train:
library("AppliedPredictiveModeling")
library("caret")
data("AlzheimerDisease")
data <- data.frame(predictors, diagnosis)
tuneGrid <- expand.grid(interaction.depth = 1:2, n.trees = 100, shrinkage = 0.1)
trainControl <- trainControl(method = "cv", number = 5, verboseIter = TRUE)
gbmFit <- train(diagnosis ~ ., data = data, method = "gbm", trControl = trainControl, tuneGrid = tuneGrid)
There are no more errors using this parameter grid instead:
tuneGrid <- expand.grid(interaction.depth = 1, n.trees = 100:101, shrinkage = 0.1)
However, I am still getting all nans in the ValidDeviance column. Is this normal?
Note: My original problem is resolved, and this is a continuation from the comments section. Formatting blocks of code in the comments section is unreadable so I'm posting it up here. This is no longer a question regarding caret, but gbm instead.
I am still having issues, however, with direct calls to gbm using a single predictor with cv.folds specified. Here is the code:
library("AppliedPredictiveModeling")
library("caret")
data("AlzheimerDisease")
diagnosis <- as.numeric(diagnosis)
diagnosis[diagnosis == 1] <- 0
diagnosis[diagnosis == 2] <- 1
data <- data.frame(diagnosis, predictors[, 1])
gbmFit <- gbm(diagnosis ~ ., data = data, cv.folds = 5)
Again, this works without specifying cv.folds but with it, returns an error:
Error in checkForRemoteErrors(val) : 5 nodes produced errors; first error: incorrect number of dimensions
It is a bug that occurs when method = 'gbm' is used with a single model (i.e. nrow(tuneGrid) == 1). I'm about to release a new version, so I will fix this in that version.
One side note... it looks like you want to do classification. In that case, y should be a factor (and you shouldn't use only integers as the classes) otherwise it will be doing regression. These changes will work for now:
y <- factor(paste("Class", y, sep = ""))
and
tuneGrid <- expand.grid(interaction.depth = 1,
n.trees = 100:101,
shrinkage = 0.1)
Thanks,
Max

How to save a glmnet model to a file in R?

When I am using R, how can I save a model built by glmnet to a file, and then read it from the file so as to use it to predict?
Is it also the same if I use cv.glmnet to build the model?
Thanks!
Maybe I misunderstand your point, but it is always feasible to use the save function to save your R object in the .RData file. Next time, you simply use load(YourFile.RData) to load the object(s) into session.
library(glmnet)
library(ISLR)
# Data and model
auto = ISLR::Auto
mod = cv.glmnet(as.matrix(auto[1:300,2:6]), as.matrix(auto[1:300,1]), type.measure = "mse", nfolds = 5)
predict(mod, newx = as.matrix(auto[300:392,2:6]), s = "lambda.min")
# Save model
save(mod, file="D:/mymodel.RData")
rm(mod)
# Reload model
load("D:/mymodel.RData")
predict(mod, newx = as.matrix(auto[300:392,2:6]), s = "lambda.min")

Resources