How to write multiple random intercepts in lqmm? - r

I'm trying to define a linear mixed regression model using lqmm package with multiple random intercept terms. However, I do not find the good syntax to do it?
1. Is it possible to do it with the lqmm package?
2. If yes, do you know the good syntax to write it?
3. If no, do you know any other package (and associated syntax)?
Example of the syntax already used:
mod <- lqmm(fixed = Y ~ log10(X), random = ~ list(1,1), group = list(site.f,spp.f),
tau = 0.95, nK = 7, type = "normal", data = data_s)
It returns: Error in model.frame.default(groupFormula, dataMix) : type (list) incorrect pour la variable 'list(site.f, spp.f)
mod <- lqmm(fixed = Y ~ log10(X), random = ~ 1, group = site.f + spp.f,
tau = 0.95, nK = 7, type = "normal", data = data_s)
It returns:
Error in rep(weights, table(grp)) : invalid 'times' argument
Thanks a lot for your help
Vincent

Related

How to fix "variable length differ" error in cv.zipath?

Trying to run a Cross validation of a zero-inflated poisson model using cv.zipath from the mpath package.
Fitting the LASSO
fit.lasso = zipath(estimation_sample_nomiss ~ .| .,
data = missings,
nlambda = 100,
family = "poisson",
link = "logit")
Cross validation
n <- dim(docvisits)[1]
K <- 10
set.seed(197)
foldid <- split(sample(1:n), rep(1:K, length = n))
fitcv <- cv.zipath(F_time_unemployed~ . | .,
data = estimation_sample_nomiss, family = "poisson",
nlambda = 100, lambda.count = fit.lasso$lambda.count[1:30],
lambda.zero = fit.lasso$lambda.zero[1:30], maxit.em = 300,
maxit.theta = 1, theta.fixed = FALSE, penalty = "enet",
rescale = FALSE, foldid = foldid)
I encounter the following error:
Error in model.frame.default(formula = F_time_unemployed ~ . + ., data = list(: variable lengths differ (found for '(weights)')
I have cleaned the sample of all NA's but still encounter the error message.
The solution turns out to be that the cv.zipath() command does not accept tibble data formats - at least in this instance. (No guarantee as to how this statement can be generalised). Having used dplyr commands, one needs to convert back to data frame. Thus, the solution is as simple as as.dataframe().

Meaning of "trait" in MCMCglmm

Like in this post I'm struggling with the notation of MCMCglmm, especially what is meant by trait. My code ist the following
library("MCMCglmm")
set.seed(123)
y <- sample(letters[1:3], size = 100, replace = TRUE)
x <- rnorm(100)
id <- rep(1:10, each = 10)
dat <- data.frame(y, x, id)
mod <- MCMCglmm(fixed = y ~ x, random = ~us(x):id,
data = dat,
family = "categorical")
Which gives me the error message For error structures involving catgeorical data with more than 2 categories pleasue use trait:units or variance.function(trait):units. (!sic). If I would generate dichotomous data by letters[1:2], everything would work fine. So what is meant by this error message in general and "trait" in particular?
Edit 2016-09-29:
From the linked question I copied rcov = ~ us(trait):units into my call of MCMCglmm. And from https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004006.html I took (and slightly modified it) the prior
list(R = list(V = diag(2), fix = 1), G = list(G1 = list(V = diag(2), nu = 1, alpha.mu = c(0, 0), alpha.V = diag(2) * 100))). Now my model actually gives results:
MCMCglmm(fixed = y ~ 1 + x, random = ~us(1 + x):id,
rcov = ~ us(trait):units, prior = prior, data = dat,
family = "categorical")
But still I've got a lack of understanding what is meant by trait (and what by units and the notation of the prior, and what is us() compared to idh() and ...).
Edit 2016-11-17:
I think trait is synoym to "target variable" or "response" in general or y in this case. In the formula for random there is nothing on the left side of ~ "because the response is known from the fixed effect specification." So the rational behind specifiying that rcov needs trait:units could be that it is alread defined by the fixed formula, what trait is (y in this case).
units is the response variable value, and trait is the response variable name, which corresponds to the categories. By specifying rcov = ~us(trait):units, you are allowing the residual variance to be heterogeneous across "traits" (response categories) so that all elements of the residual variance-covariance matrix will be estimated.
In Section 5.1 of Hadfield's MCMCglmm Course Notes (vignette("CourseNotes", "MCMCglmm")) you can read an explanation for the reserved variables trait and units.

non-linear optimization in R using optim

I'm a newbie in R!
I would like to find the best gamma distribution parameters to fit my experimental counts data. The optim function's help file says the first argument of the function should be the parameters to be optimized. So I tried :
x = as.matrix(seq(1,20,0.1))
yexp = dgamma(x,2,1)*100 + rnorm(length(x),0,1)
f = function(p,x,yexp) {sum((p[1]*dgamma(x,p[2],scale=p[3]) - yexp)^2)}
mod = optim(c(50,2,1),f(p,x,yexp))
I get the error message :
Error in f(p, x, yexp) : object 'p' not found
Any hint where I'm wrong?
Supplementary question : Is there any other way to fit counts data with standard distribution (gamma, inverse gaussian, etc?)
optim expects its second argument to be a function. Also, the second and third arguments to f are fixed and need to be specified:
optim(c(50, 1, 2), f, x = x, yexp = yexp)
This would also work:
optim(c(50, 1, 2), function(p) f(p, x, yexp))
You could also use nls with default Nelder-Mead algorithm:
nls(yexp ~ a * dgamma(x, sh, scale=sc), start = list(a = 50, sh = 2, sc = 1))
or with plinear in which case no starting value is needed for the first parameter:
nls(c(yexp) ~ dgamma(x, sh, scale=sc), start = list(sh = 2, sc = 1), alg = "plinear")

linear quantile mixed model [R] lqmm - package: Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1)

I want to compute linear quantile mixed models but I always get the following error
Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1)
To reproduce please download the dataset and import it:
https://dl.dropboxusercontent.com/u/79415744/mixedModelDataSet.txt
stackoverflow <- read.table("mixedModelDataSet.txt", sep="\t", header = TRUE ) # import
then try to compute the model:
require("lqmm")
stack15 <- lqmm(gsDeviationMio ~ aoi, random = ~ 1, group = vpName, data = stackoverflow, tau = 0.15)
What I am doing wrong?
Computing non quantile mixed models works:
stackLme <- lme(gsDeviationMio ~ aoi, random = ~ 1|vpName, data = stackoverflow)
Thanks a lot for your help!
Best,
Florian
Here is the answer of Marco Geraci (author of lqmm)
There's a problem with the scale of the response. Also, the 'gs' algorithm seems to have some issues with this dataset. Try the following
stackoverflow$y <- scale(stackoverflow$gsDeviationMio, center = T, scale = T)
lqmm(y ~ aoi, random = ~ 1, group = vpName, data = stackoverflow, tau = 0.15, control = lqmmControl(method = "df", UP_max_iter = 200))

Piecewise linear regression in R (segmented.lm)

I appreciate any help to make segmented.lm (or any other function) find the obvious breakpoints in this example:
data = list(x=c(50,60,70,80,90) , y= c(703.786,705.857,708.153,711.056,709.257))
plot(data, type='b')
require(segmented)
model.lm = segmented(lm(y~x,data = data),seg.Z = ~x, psi = NA)
It returns with the following error:
Error in solve.default(crossprod(x1), crossprod(x1, y1)) :
system is computationally singular: reciprocal condition number = 1.51417e-20
If I change K:
model.lm = segmented(lm(y~x,data = data),seg.Z = ~x, psi = NA, control = seg.control(K=1))
I get another error:
Error in segmented.lm(lm(y ~ x, data = data), seg.Z = ~x, psi = NA, control = seg.control(K = 1)) :
only 1 datum in an interval: breakpoint(s) at the boundary or too close each other
A nice objective method to determine the break point is described in Crawley (2007: 427).
First, define a vector breaks for a range of potential break points:
breaks <- data$x[data$x >= 70 & data$x <= 90]
Then run a for loop for piecewise regressions for all potential break points and yank out the minimal residual standard error (mse) for each model from the summary output:
mse <- numeric(length(breaks))
for(i in 1:length(breaks)){
piecewise <- lm(data$y ~ data$y*(data$x < breaks[i]) + data$y*(data$x >= breaks[i]))
mse[i] <- summary(piecewise)[6]
}
mse <- as.numeric(mse)
Finally, identify the break point with the least mse:
breaks[which(mse==min(mse))]
Hope this helps.

Resources