How do I extract random effects from MCMCglmm? - r

I am looking for a command similar to ranef() used in nlme, lme4, and brms that will allow me to extract the individual random effects in my MCMCglmm model. In my dataset, I have 40 providers and I would like to extract the random effects for each provider and plot them in a caterpillar plot. Any suggestions would be great. Thank you.
In case it is helpful, here is my MCMCglmm model:
prior.3 <- list(R = list(R1 = list(V = diag(2), nu = 0.002)),
G = list(G1 = list(V = diag(2), nu = 0.002),
G2 = list(V = diag(2), nu = 0.002)))
mc_mod2 <- MCMCglmm(outcome ~ 1, data = filter(data, rem2 == "white" | rem2 == "rem"),
random = ~ idh(rem2):id + us(rem2):provider,
rcov = ~idh(rem2):units,
verbose = TRUE,
prior = prior.3,
family = "gaussian",
nitt = 100000, burnin = 5000,
pr = TRUE)

A little more detail, since the package doesn't seem to have caterpillar plots built in: note you need to use pr=TRUE when calling MCMCglmm in order to store the random effects values.
library(MCMCglmm)
data(PlodiaPO)
model1 <- MCMCglmm(PO~1, random=~FSfamily, data=PlodiaPO, verbose=FALSE,
nitt=1300, burnin=300, thin=1,
pr=TRUE)
if (!require("postMCMCglmm")) {
devtools::install_github("JWiley/postMCMCglmm")
library("postMCMCglmm")
}
ranef() appears to return a matrix of the random effects (rows=levels, columns=samples). Convert to a data frame with mean and quantiles:
qfun <- function(x,lev) unname(quantile(x,lev))
rsum <- as.data.frame(t(apply(ranef(model1),1,
function(x) c(est=mean(x),
min=qfun(x,0.025),max=qfun(x,0.975)))))
Order for plotting:
rsum$term <- reorder(factor(rownames(rsum)),
rsum$est)
Plot:
library(ggplot2)
ggplot(rsum,aes(term,est))+
geom_pointrange(aes(ymin=min,ymax=max))+
coord_flip()

I overlooked an additional package that needed to be installed (thanks for pointing this out, Ben).
To be able to run ranef(), simply install the postMCMCglmm package - https://github.com/jwiley/postMCMCglmm/
#install.packages("devtools")
require(devtools)
install_github("JWiley/postMCMCglmm")

Related

Kfold CV in brms

I am trying to use kfold CV as a means of evaluating a model run using brms and I feel like I'm missing something. As a reproducible example, my data are structured as a binary response (0, 1) dependent on the length of an individual. Here is some code to generate and plot data similar to those I am working with:
library(brms)
library(tidyverse)
library(loo)
length <- seq(0, 100, by = 1)
n_fish_per_length <- 10
a0 <- -48
a1 <- 2
a2 <- -0.02
prob <- plogis(a0 + a1 * length + a2 * length^2)
plot(length, prob , type = 'l')
sim_data <-
expand_grid(fish_id = seq_len(n_fish_per_length),
length = length) %>%
mutate(prob_use = plogis(a0 + a1 * length + a2 * length^2)) %>%
mutate(is_carp = rbinom(n = n(), size = 1, prob= prob_use))
ggplot(sim_data, aes(x = length, y = is_carp)) +
geom_jitter(width = 0, height = 0.05) +
geom_smooth(method = "glm", formula = y ~ x + I(x^2),
method.args = list(family = binomial(link = "logit")))
I then use brms to run my model.
Bayes_Model_Binary <- brm(formula = is_carp ~ length + I(length^2),
data=sim_data,
family = bernoulli(link = "logit"),
warmup = 2500,
iter = 5000,
chains = 4,
inits= "0",
cores=4,
seed = 123)
summary(Bayes_Model_Binary)
I'd like to use kfold CV to evaluate the model. I can use something like this:
kfold(Bayes_Model_Binary, K = 10, chains = 1, save_fits = T)
but the response in my data is highly imbalanced (~18% = 1, ~82% = 0) and my reading suggests that I need to used stratified kfold cv to account for this. If I use:
sim_data$fold <- kfold_split_stratified(K = 10, x = sim_data$is_carp)
the data are split the way I would expect but I'm not sure what the best way is to move forward with the CV process from here. I saw this post https://mc-stan.org/loo/articles/loo2-elpd.html, but I'm not sure how to modify this to work with a brmsfit object. Alternatively, it appears that I should be able to use:
kfold(Bayes_Model_Binary, K = 10, folds = 'stratified', group = sim_data$is_carp)
but this throws an error. Likely because is_carp is the response rather than a predictor in the model. What would my group be in this context? Am I missing/misinterpreting something here? I'm assuming that there is a very simple solution here that I am overlooking but appreciate any thoughts.
After some additional digging and learning how to access information about each fold in the analysis, I was able to determine that the structure of the data (proportion of 0s and 1s in the response) is maintained using the default settings in the kfold() function. To do this I used the following code.
First, save the kfold CV analysis as an object.
kfold1 <- kfold(Bayes_Model_Binary, K = 10, save_fits = T)
kfold1$fits is a list of the model fitting results and the observations used in the test data set (omitted) for each fold.
From this information, I created a loop to print the proportion of observations in each training data set where is_carp = 1 (could also do this for each test data set) with the following code.
for(i in 1:10){
print(length(which(sim_data$is_carp[-kfold1$fits[i, ]$omitted] == 1)) /
nrow(sim_data[-kfold1$fits[i, ]$omitted, ]))
}
[1] 0.1859186
[1] 0.1925193
[1] 0.1991199
[1] 0.1914191
[1] 0.1881188
[1] 0.1848185
[1] 0.1936194
[1] 0.1980198
[1] 0.190319
[1] 0.1870187
and it's easy to then compare these proportions with the proportion of observations where is_carp = 1 from the original data set.
length(which(sim_data$is_carp == 1)) / nrow(sim_data)
[1] 0.1910891

sampling from posterior predictive distribution (stan vs inla)

I'm trying to implement functions from bayesplot package on a INLA object and a little unsure of how to draw from the posterior predictive distribution. I think I almost have it but rstan draws are more variable than the INLA ones.
In rstan, using the simplified example from bayesplot vignette I can:
library(bayesplot)
library(ggplot2)
library(rstanarm)
library(ggpubr)
library(tidyverse)
#rstan model set up
roaches$roach100 <- roaches$roach1 / 100 # pre-treatment number of roaches (in 100s)
fit_poisson <- stan_glm(y ~ roach100 + treatment + senior, offset = log(exposure2), family = poisson(link = "log"), data = roaches, seed = 1111, refresh = 0)
#In order to use the PPC functions from the bayesplot package we need a vector y of outcome values:
y <- roaches$y
#and a matrix yrep of draws from the posterior predictive distribution,
yrep_poisson <- posterior_predict(fit_poisson, draws = 500)
#then plot:
p1 <- bayesplot::ppc_dens_overlay(y, yrep_poisson[1:50, ])
p1
I want to replicate that plot on a INLA object. According to the bayesplot vignette you can do this as they have provided code to define a simple pp_check method that creates fitted model objects of class e.g. foo:
pp_check.foo <- function(object, type = c("multiple", "overlaid"), ...) {
type <- match.arg(type)
y <- object[["y"]]
yrep <- object[["yrep"]]
stopifnot(nrow(yrep) >= 50)
samp <- sample(nrow(yrep), size = ifelse(type == "overlaid", 50, 5))
yrep <- yrep[samp, ]
if (type == "overlaid") {
ppc_dens_overlay(y, yrep, ...)
} else {
ppc_hist(y, yrep, ...)
}
}
To use pp_check.foo we can just make a list with y and yrep components and give it class foo:
x <- list(y = rnorm(200), yrep = matrix(rnorm(1e5), nrow = 500, ncol = 200))
class(x) <- "foo"
#create plot above:
pp_check(x, type = "overlaid")
INLA
#create same model but in inla:
library(INLA)
fit_poisson_inla <- inla(y ~ roach100 + treatment + senior, offset = log(exposure2), data = roaches,
control.predictor = list(compute = T),
family = "poisson")
inla_object_name$marginals.fitted.values returns a posterior predictive distribution for each y:
fit_poisson_inla$marginals.fitted.values
#so to get distribution for first oberservation:
fitted.Predictor.1 <- fit_poisson_inla$marginals.fitted.values[[1]]
I think repeatedly sampling from this would give me what I need but there are only 75 values (dim(fitted.Predictor.1) per observation used to create this distribution when in reality I would want to be sampling from a full range of values. I think we can do this (section 4.3 here) by using inla.tmarginal using linear predictor:
fitted_dist <- fit_poisson_inla$marginals.linear.predictor
#should i have used "inla.rmarginal(n, marginal)"?
marginal_dist <- lapply(fitted_dist, function(y) inla.tmarginal(function(x) {exp(x)}, y)) %>% map(~ as.data.frame(.) %>% rename(., xx = x))
#resample 500 times
yrep_poisson_inla <- as.matrix(bind_rows(rerun(500, lapply(marginal_dist, function(x) sample(x$xx, 1)) %>% as.data.frame())))
#convert to class foo for pp_check
x <- list(y = y, yrep = yrep_poisson_inla[1:50, ])
class(x) <- "foo"
p2 <- pp_check(x, type = "overlaid")
#plot
ggarrange(p1, p2, ncol = 1, nrow = 2, labels = c("rstan", "inla sample"))
My question is how do I correctly get a matrix of draws from the posterior predictive distribution from this inla (fit_poisson_inla) object to pass into pp_check? yrep_poisson produces discrete values while yrep_poisson_inla produces continuous values. There is a lot more variation in the rstan draws than INLA (second plot). Is what I have done correct and this is just some sampling issue or is it an artifact of the different methods? In more complicated examples the differences could be substantial.
Thanks

How do you run nonlinear moderation using the nlsem package in R?

I'm just trying to learn how to use the nlsem package in R to fit nonlinear SEMM, but I keep running into to the error "Posterior probability could not be calculated properly. Choose different starting parameters" when I try to create the res object. I'm trying to estimate a nonlinear model where latent variable tas predicts latent variable cts, moderated by latent variable ams. I'm still pretty new to R and very new to nonlinear analyses, so any help at all would be appreciated!
My code so far:
##nonlinear SEM
#Select data
FPerpSEMM<-subset(FPerp,
select=(c("tas1", "tas3", "tas6", "tas7", "tas9", "tas13","tas14", "AMSEscalate",
"AMSNegAttribution", "AMSSelfAware", "AMSCalming", "cts_5", "cts_25",
"cts_29", "cts_35", "cts_49", "cts_65", "cts_67", "cts_69")))
FPerpSEMM$x1<-FPerpSEMM$tas1
FPerpSEMM$x2<-FPerpSEMM$tas3
FPerpSEMM$x3<-FPerpSEMM$tas6
FPerpSEMM$x4<-FPerpSEMM$tas7
FPerpSEMM$x5<-FPerpSEMM$tas9
FPerpSEMM$x6<-FPerpSEMM$tas13
FPerpSEMM$x7<-FPerpSEMM$tas14
FPerpSEMM$x8<-FPerpSEMM$AMSEscalate
FPerpSEMM$x9<-FPerpSEMM$AMSNegAttribution
FPerpSEMM$x10<-FPerpSEMM$AMSSelfAware
FPerpSEMM$x11<-FPerpSEMM$AMSCalming
FPerpSEMM$y1<-FPerpSEMM$cts_5
FPerpSEMM$y2<-FPerpSEMM$cts_25
FPerpSEMM$y3<-FPerpSEMM$cts_29
FPerpSEMM$y4<-FPerpSEMM$cts_35
FPerpSEMM$y5<-FPerpSEMM$cts_49
FPerpSEMM$y6<-FPerpSEMM$cts_65
FPerpSEMM$y7<-FPerpSEMM$cts_67
FPerpSEMM$y8<-FPerpSEMM$cts_69
FPerpSEMMr1<-subset(FPerpSEMM,
select=(c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11",
"y1","y2","y3","y4","y5","y6","y7","y8")))
#Create dataframe containing only complete cases
FPerpSEMMcc<-na.omit(FPerpSEMMr1)
# load data
dat <- as.matrix(FPerpSEMMcc[, c(12:19, 1:7, 8:11)])
# specify model of class SEMM
model<- specify_sem(num.x = 11, num.y = 8, num.xi = 2, num.eta = 1,
xi = "x1-x7,x8-x11", eta = "y1-y8",
num.classes = 3, interaction = "xi1:xi2", rel.lat = "eta1~xi1+xi2",
constraints = "direct1")
class(model)
#fit model
dat <- as.matrix(FPerpSEMMcc[, c(12:19, 1:7, 8:11)])
set.seed(911)
pars.start <- runif(count_free_parameters(model))
res <- em(model, dat, pars.start, convergence = 0.1, max.iter = 200)
summary(res)
plot(res)

How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation?

I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t).
So, I want to estimate the variances H_t and Q_t, but also T_t, the AR(1) coefficient. My code is as follows:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
fit <- fitSSM(ss_model, inits = c(0,0.6,0), method = 'L-BFGS-B')
But it returns: "Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)),: System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07"
The NA definitions for the variances works well, as documented in the package's paper. However, it seems this cannot be done for the AR coefficients. Does anyone know how can I do this?
Note that I am aware of the SSMarima function, which eases the definition of the transition equation as ARIMA models. Although I am able to estimate the AR(1) coef. and Q_t this way, I still cannot estimate the \eps_t variance (H_t). Moreover, I am migrating my Kalman filter codes from EViews to R, so I need to learn SSMcustom for other models that are more complicated.
Thanks!
It seems that you are missing something in your example, as your error message comes from the function fitSSM. If you want to use fitSSM for estimating general state space models, you need to provide your own model updating function. The default behaviour can only handle NA's in covariance matrices H and Q. The main goal of fitSSM is just to get started with simple stuff. For complex models and/or large data, I would recommend using your self-written objective function (with help of logLik method) and your favourite numerical optimization routines manually for maximum performance. Something like this:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
objf <- function(pars, model, estimate = TRUE) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
if (estimate) {
-logLik(model)
} else {
model
}
}
opt <- optim(c(1, 0.5, 1), objf, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100), model = ss_model)
ss_model_opt <- objf(opt$par, ss_model, estimate = FALSE)
Same with fitSSM:
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
identical(ss_model_opt, fit$model)

Prediction on Neural Network in R

I want to get the accuracy or the RMSE of the Prediction result of a neural network. I started using a Confusion Matrix, but as indicated by previous answers, the Confusion Matrix gives valid results for non Continuous variables.
Is there any way I can get the accuracy or the error rate of a Neural Network Prediction??
As an example here is the code I've got until now:
library(nnet)
library(caret)
library(e1071)
data(rock)
newformula <- perm ~ area + peri + shape
y <- rock[, "perm"]
x <- rock[!colnames(rock)%in% "perm"]
original <- datacol(rock,"perm")
nnclas_model <- nnet(newformula, data = rock, size = 4, decay = 0.0001, maxit = 500)
nnclas_prediction <- predict(nnclas_model, x)
nnclas_tab <- table(nnclas_prediction, y)
rmse <- sqrt(mean((original - nnclas_prediction)^2))
Does anyone know how can I make this work? or how can I get the Accuracy or the of the Neural Network Prediction?
Any help will be deeply appreciated.
I don't know about "nnet", but I have used the "neuralnet" library and am able to get the RMSE. Here is my full code: Just change the data for training_Data and testing_Data as per your requirements and in place of "Channel" give what is your classification attribute
dat <- read.csv("Give path of your data file here")
summary(dat)
cleandata <- dat
cleandata <- na.omit(cleandata)
#scaling
apply(cleandata,MARGIN = 2, FUN = function(x)sum(is.na(x)))
maxs = apply(cleandata, MARGIN = 2, max)
mins = apply(cleandata, MARGIN = 2, min)
scaledData = as.data.frame(scale(cleandata, center = mins, scale = maxs - mins))
summary(scaledData)
#Splitting data in 80:20 ratio
train = sample(1:nrow(scaledData), nrow(scaledData)*0.8)
test = -train
training_Data = scaledData[train,]
testing_Data = scaledData[test,]
dim(training_Data)
dim(testing_Data)
#neural net
library(neuralnet)
n <- names(training_Data)
f <- as.formula(paste("Channel ~", paste(n[!n %in% "Channel"], collapse = " + ")))
neuralnet_Model <- neuralnet(f,data = training_Data, hidden = c(2,1))
plot(neuralnet_Model)
neuralnet_Model$result.matrix
pred_neuralnet<-compute(neuralnet_Model,testing_Data[,2:8])
pred_neuralnet.scaled <- pred_neuralnet$net.result *(max(scaledData$Channel)-min(scaledData$Channel))+min(scaledData$Channel)
real.values <- (testing_Data$Channel)*(max(cleandata$Channel)-min(cleandata$Channel))+min(cleandata$Channel)
MSE.neuralnetModel <- sum((real.values - pred_neuralnet.scaled)^2)/nrow(testing_Data)
MSE.neuralnetModel
plot(real.values, pred_neuralnet.scaled, col='red',main='Real vs predicted',pch=18,cex=0.7)
abline(0,1,lwd=2)
legend('bottomright',legend='NN',pch=18,col='red', bty='n')
As mentioned in the comments, confusion matrices are for classification problems. If you meant to classify perm according to its levels, then the following code should work for you.
library(nnet)
library(caret)
library(e1071)
data(rock)
rock$perm <- as.factor(rock$perm)
nnclas_model <- nnet(perm ~ area + peri + shape, data = rock,
size = 4, decay = 0.0001, maxit = 500)
x <- rock[, 1:3]
y <- rock[, 4]
yhat <- predict(nnclas_model, x, type = 'class')
confusionMatrix(as.factor(yhat), y)
If you mean to treat perm as continuous, the confusion matrix doesn't make any sense. You should think in terms of mean-squared error instead.

Resources