Text string in Knitr (Rstudio): Error in parse - r

I am attempting to use knitr trough Rstudio to document a model that save a text string to a *txt file.
When doing so, i get this R markdown error message:
*Error in parse(text = x, srcfile = src) : <text>:2:24: unexpected
INCOMPLETE_STRING 14: var.m <- 1/tau.m # between-trial variance 15:
Calls: <Anonymous> ... <Anonymous> -> parse_all -> parse_all.character -> parse*
Anyone know to fix this?
This string works fine:
Modelstring.baseline = " Text goes here "
This string works fine:
Modelstring.baseline = "
# Binomial likelihood, logit link, MTC
# Fixed effect model
#CV mortality
model{ # *** PROGRAM STARTS
for(i in 1:ns){ # LOOP THROUGH STUDIES
mu[i] ~ dnorm(0,.0001) # vague priors for all trial baselines
for (k in 1:na[i]) { # LOOP THROUGH ARMS
r[i,k] ~ dbin(p[i,k],n[i,k]) # binomial likelihood
logit(p[i,k]) <- mu[i] + d[t[i,k]]-d[t[i,1]] # model for linear predictor
rhat[i,k] <- p[i,k] * n[i,k] # expected value of the numerators
dev[i,k] <- 2 * (r[i,k] * (log(r[i,k])-log(rhat[i,k])) # Deviance contribution
+ (n[i,k]-r[i,k]) * (log(n[i,k]-r[i,k]) - log(n[i,k]-rhat[i,k])))
}
resdev[i] <- sum(dev[i,1:na[i]]) # summed residual deviance contribution for this trial
}
totresdev <- sum(resdev[]) # Total Residual Deviance
d[1]<- 0 # treatment effect is zero for reference treatment
for (k in 2:nt) { d[k] ~ dnorm(0,.0001) } # vague priors for treatment effects
"
Whiles this string generate a parser error:
Modelstring.baseline = "
model{ # *** PROGRAM STARTS
for (i in 1:ns)
{ # LOOP THROUGH STUDIES
r[i] ~ dbin(p[i],n[i]) # Likelihood
logit(p[i]) <- mu[i] # Log-odds of response
mu[i] ~ dnorm(m,tau.m) # Random effects model
}
mu.new ~ dnorm(m,tau.m) # predictive dist. (log-odds)
m ~ dnorm(0,.0001) # vague prior for mean
var.m <- 1/tau.m # between-trial variance
#---Non-informative prior
#tau.m <- pow(sd.m,-2)
#sd.m ~ dunif(0,5)
#---Vaguely informative prior
#tau.m ~ dgamma(0.001,.001)
#sd.m ~ pow(tau.m,-0.5)
#---Informative prior R.M Turner et al LN(-3.95, 1.79)
tau.m <- 1/tausq
tausq ~ dlnorm(-3.95, 0.31) #0.31 = 1/(1.79*1.79)
}
"

Related

How to manually calculate the residuals of linear model in R

I am trying to calculate manually the r-squared given by lm() in R
Considering:
fit <- lm(obs_values ~ preds_values, df)
with sd(df$obs_values) == sd(df$preds_values) and mean(df$obs_values) == mean(df$preds_values)
To do so I can extract the residuals by doing
res_a = residuals(fit) and then inject them in the formula as :
y = sum( (df$obs_values - mean(df$obs_values))^2 )
r-squared = 1 - sum(res_a^2)/y
Here I get the expected r-squared
Now, I would like to get the residual manually.
It should be as trivial as :
res_b = df$obs_values - df$predss_values, but for some reason, res_b is different than res_a...
You can't just do y - x in a regression y ~ x to get residuals. Where have regression coefficients gone?
fit <- lm(y ~ x)
b <- coef(fit)
resi <- y - (b[1] + b[2] * x)
You have many options:
## Residuals manually
# option 1
beta_hat <- coef(fit)
obs_values_hat <- beta_hat["(Intercept)"] + beta_hat["preds_values"] * preds_values
u_hat <- obs_values - obs_values_hat # residuals
# option 2
obs_values_hat <- fitted(fit)
u_hat <- obs_values - obs_values_hat # residuals
# (option 3 - not manually) or just u_hat <- resid(fit)
## R-squared manually
# option 1
var(obs_values_hat) / var(obs_values)
# option 2
1 - var(u_hat) / var(obs_values)
# option 3
cor(obs_values, obs_values_hat)^2

Non linear fit with R

I try to obtain the first three coefficients for Cauchy's dispersion equation for Silicon. Using a csv containing the refractive index for some wavelengths (that you can find here), I try to fit the following model :
library(readr)
library(tidyverse)
library(magrittr)
library(modelr)
library(broom)
library(splines)
# CSV parsing
RefractiveIndexINFO <- read_csv("./silicon-index.csv")
# Cleaning the output of the csv-parsing
indlong = tibble(RefractiveIndexINFO$`Wavelength. µm`,RefractiveIndexINFO$n)
names(indlong) = c('w','n')
# Remove some wavelengths that might not fit
indlong_non_uv = indlong %>% filter(indlong$w >= 0.4)
# Renaming variables
w = indlong_non_uv$w
n = indlong_non_uv$n
# Creating the non linear model
model = nls(n ~ a + b*ns(w,-2) + c*ns(w,-4), data = indlong_non_uv)
# Gathering informations on the fitted model
cor(indlong_non_uv$n,predict(model))
tidy(model)
Which gives the following error :
Error in c * ns(w, -4) : non-numeric argument to binary operator
How can I circumvent this situation and get the three coefficients (a,b,c) in a row ?
Obviously, using model = nls(n ~ a + b*ns(w,-2), data = indlong_non_uv) does not give an error.
Try this:
library(readr)
library(tidyverse)
library(magrittr)
library(modelr)
library(broom)
library(splines)
# CSV parsing
RefractiveIndexINFO <- read_csv("aspnes.csv")
RefractiveIndexINFO <- RefractiveIndexINFO[1:46,]
RefractiveIndexINFO <- as.data.frame(apply(RefractiveIndexINFO,2,as.numeric))
names(RefractiveIndexINFO) <- c('w','n')
indlong_non_uv = RefractiveIndexINFO %>% filter(RefractiveIndexINFO$w >= 0.4)
# Creating the nonlinear model
model <- nls(n ~ a + b*w^(-2) + c*w^(-4), data = indlong_non_uv,
start=list(a=1, b=1, c=1))
# Gathering informations on the fitted model
cor(indlong_non_uv$n,predict(model))
# [1] 0.9991006
tidy(model)
# term estimate std.error statistic p.value
# 1 a 3.65925186 0.039368851 92.947896 9.686805e-20
# 2 b -0.04981151 0.024099580 -2.066904 5.926046e-02
# 3 c 0.05282668 0.003306895 15.974707 6.334197e-10
Alternatively, you can use linear regression:
model2 <- lm(n ~ I(w^(-2)) + I(w^(-4)), data = indlong_non_uv)
summary(model2)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.659252 0.039369 92.948 < 2e-16 ***
# I(w^(-2)) -0.049812 0.024100 -2.067 0.0593 .
# I(w^(-4)) 0.052827 0.003307 15.975 6.33e-10 ***

How to conduct linear hypothesis test on regression coefficients with a clustered covariance matrix?

I am interested in calculating estimates and standard errors for linear combinations of coefficients after a linear regression in R. For example, suppose I have the regression and test:
data(mtcars)
library(multcomp)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
summary(glht(lm1, linfct = 'cyl + hp = 0'))
This will estimate the value of the sum of the coefficients on cyl and hp, and provide the standard error based on the covariance matrix produced by lm.
But, suppose I want to cluster my standard errors, on a third variable:
data(mtcars)
library(multcomp)
library(lmtest)
library(multiwayvcov)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
vcv <- cluster.vcov(lm1, cluster = mtcars$am)
ct1 <- coeftest(lm1,vcov. = vcv)
ct1 contains the SEs for my clustering by am. However, if I try to use the ct1 object in glht, you get an error saying
Error in modelparm.default(model, ...) :
no ‘coef’ method for ‘model’ found!
Any advice on how to do the linear hypothesis with the clustered variance covariance matrix?
Thanks!
glht(ct1, linfct = 'cyl + hp = 0') won't work, because ct1 is not a glht object and can not be coerced to such via as.glht. I don't know whether there is a package or an existing function to do this, but this is not a difficult job to work out ourselves. The following small function does it:
LinearCombTest <- function (lmObject, vars, .vcov = NULL) {
## if `.vcov` missing, use the one returned by `lm`
if (is.null(.vcov)) .vcov <- vcov(lmObject)
## estimated coefficients
beta <- coef(lmObject)
## sum of `vars`
sumvars <- sum(beta[vars])
## get standard errors for sum of `vars`
se <- sum(.vcov[vars, vars]) ^ 0.5
## perform t-test on `sumvars`
tscore <- sumvars / se
pvalue <- 2 * pt(abs(tscore), lmObject$df.residual, lower.tail = FALSE)
## return a matrix
matrix(c(sumvars, se, tscore, pvalue), nrow = 1L,
dimnames = list(paste0(paste0(vars, collapse = " + "), " = 0"),
c("Estimate", "Std. Error", "t value", "Pr(>|t|)")))
}
Let's have a test:
data(mtcars)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
library(multiwayvcov)
vcv <- cluster.vcov(lm1, cluster = mtcars$am)
If we leave .vcov unspecified in LinearCombTest, it is as same as multcomp::glht:
LinearCombTest(lm1, c("cyl","hp"))
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp = 0 -2.283815 0.5634632 -4.053175 0.0003462092
library(multcomp)
summary(glht(lm1, linfct = 'cyl + hp = 0'))
#Linear Hypotheses:
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp == 0 -2.2838 0.5635 -4.053 0.000346 ***
If we provide a covariance, it does what you want:
LinearCombTest(lm1, c("cyl","hp"), vcv)
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp = 0 -2.283815 0.7594086 -3.00736 0.005399071
Remark
LinearCombTest is upgraded at Get p-value for group mean difference without refitting linear model with a new reference level, where we can test any combination with combination coefficients alpha:
alpha[1] * vars[1] + alpha[2] * vars[2] + ... + alpha[k] * vars[k]
rather than just the sum
vars[1] + vars[2] + ... + vars[k]

Mean variance of a difference of BLUEs or BLUPs in `lme4`

Given below is the code for analysis of a resolvable alpha design (alpha lattice design) using the R package asreml.
# load the data
library(agridat)
data(john.alpha)
dat <- john.alpha
# load asreml
library(asreml)
# model1 - random `gen`
#----------------------
# fitting the model
model1 <- asreml(yield ~ 1 + rep, data=dat, random=~ gen + rep:block)
# variance due to `gen`
sg2 <- summary(model1 )$varcomp[1,'component']
# mean variance of a difference of two BLUPs
vblup <- predict(model1 , classify="gen")$avsed ^ 2
# model2 - fixed `gen`
#----------------------
model2 <- asreml(yield ~ 1 + gen + rep, data=dat, random = ~ rep:block)
# mean variance of a difference of two adjusted treatment means (BLUE)
vblue <- predict(model2 , classify="gen")$avsed ^ 2
# H^2 = .803
sg2 / (sg2 + vblue/2)
# H^2c = .809
1-(vblup / 2 / sg2)
I am trying to replicate the above using the R package lme4.
# model1 - random `gen`
#----------------------
# fitting the model
model1 <- lmer(yield ~ 1 + (1|gen) + rep + (1|rep:block), dat)
# variance due to `gen`
varcomp <- VarCorr(model1)
varcomp <- data.frame(print(varcomp, comp = "Variance"))
sg2 <- varcomp[varcomp$grp == "gen",]$vcov
# model2 - fixed `gen`
#----------------------
model2 <- lmer(yield ~ 1 + gen + rep + (1|rep:block), dat)
How to compute the vblup and vblue (mean variance of difference) in lme4 equivalent to predict()$avsed ^ 2 of asreml ?
I'm not that familiar with this variance partitioning stuff, but I'll take a shot.
library(lme4)
model1 <- lmer(yield ~ 1 + rep + (1|gen) + (1|rep:block), john.alpha)
model2 <- update(model1, . ~ . + gen - (1|gen))
## variance due to `gen`
sg2 <- c(VarCorr(model1)[["gen"]]) ## 0.142902
Get conditional variances of BLUPs:
rr1 <- ranef(model1,condVar=TRUE)
vv1 <- attr(rr$gen,"postVar")
str(vv1)
## num [1, 1, 1:24] 0.0289 0.0289 0.0289 0.0289 0.0289 ...
This is a 1x1x24 array (effectively just a vector of variances; we could collapse using c() if we needed to). They're not all the same, but they're pretty close ... I don't know whether they should all be identical (and this is a roundoff issue)
(uv <- unique(vv1))
## [1] 0.02887451 0.02885887 0.02885887
The relative variation is approximately 5.4e-4 ...
If these were all the same then the mean variance of a difference of any two would be just twice the variance (Var(x-y) = Var(x)+Var(y); by construction the BLUPs are all independent). I'm going to go ahead and use this.
vblup <- 2*mean(vv1)
For the model with gen fitted as a fixed effect, let's extract the variances of the parameters relating to genotypes (which are differences in the expected value from the first level):
vv2 <- diag(vcov(model2))[-(1:3)]
summary(vv2)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.06631 0.06678 0.07189 0.07013 0.07246 0.07286
I'm going to take the means of these values (not double the values, since these are already the variances of differences)
vblue <- mean(vv2)
sg2/(sg2+vblue/2) ## 0.8029779
1-(vblup/2/sg2) ## 0.7979965
The H^2 estimate looks right on, but the H^2c estimate is a little different (0.797 vs. 0.809, a 1.5% relative difference); I don't know if that is big enough to be of concern or not.

Bayesian ANCOVA in R via jags

I'm trying to implement a Bayesian ANCOVA that takes account of heteroscedasticity in R using JAGS. However, despite going through several tutorials of Bayesian simple regression and ANOVA, I can't understand how to prepare the file for JAGS. Here is my code so far:
y1 = rexp(57, rate=0.8) # dependent variable
x1 = hist(rbeta(57, 6, 2)) # continuous factor
x2 = rep(c(1, 2), 57/2) # categorical factor
groups = 2
n = 57
# list of variables
lddados <- list(g=groups, n=length(x), y=y, x1=x1, x2=x2)
sink('reglin.txt') # nome do arquivo aqui
cat('
# model
{
for(i in 1:n){
mu[i] = a0 + a[i]
y[i] = a0 + x1*a[ x2[i] ] + ε[i]
}
priors
y ~ dgamma(0.001,0.01)
for(i in 1:n){
inter[i] ~ dgamma(0.001,0.001)
coef[i] ~ dnorm(0.0,1.0E-
likelihood
got stuck...
}
}#------fim do modelo
')
sink()
Im currently trying out ANCOVA using rjags myself...
From my understanding, I would test this (untested);
require(rjags)
require(coda)
model_string <- "
model {
for ( i in 1:n ){
mu[i] <- a0 + a[x2[i]] + a3 * x1[i] # linear predictor
y[i] ~ dnorm(mu[i], prec) # y is norm. dist.
}
# priors
a0 ~ dnorm(0, 1.0E-6) # intercept
a[1] ~ dnorm(0, 1.0E-6) # effect of x1 at x2 level 1
a[2] ~ dnorm(0, 1.0E-6) # effect of x1 at x2 level 2
a3 ~ dnorm(0, 1.0E-6) # regression coefficient for x1 (covariate)
prec ~ dgamma(0.001, 0.001) # precision (inverse of variance)
}
"
# initial values for the mcmc
inits_list <- list(a=0, b=c(0,0), prec=100)
# model, initial values and data in right format
jags_model <- jags.model(textConnection(model_string), data=data, inits=inits_list, n.adapt = 500, n.chains = 3, quiet = T)
# burn-in
update(jags_model, 10000)
# run the mcmc chains using the coda package
mcmc_samples <- coda.samples(jags_model, c("mu", "a", "a1", "a2", "prec"), n.iter = 100000)
Tell me if it works...
Recommended books; McCarthy M. Bayesian Methods for Ecology and Kruschke JK. Doing Bayesian Data Analysis

Resources