R, Bayestats and Jags newbie here. I'm working on modeling some count data, right censored. Poisson seems to be my best guess. I wanna do a hierarchical model, as it leaves me with more possibilities to fine tune the parameterss. Can I simply write something like this:
A[i,j] <- dpois(a[i,j])
a[i,j]) <- b[i,]*x[i,j] +c[i] for all j,
where x[i,j] are my variables, or should I separate the censored time interval from the previous ones or something?
b[,] and c have a prior.
Thank you!
This is not clear to me what is supposed to be hierarchical.
You can have the time effect separated from the covariate effect, in which case the covariate effect is not related to the station.
Moreover, the linear part of your GLM should be positive because poisson distribution requires positive values. Look here: http://www.petrkeil.com/?p=1709
A proposition for you could be:
b1 ~ prior
b2 ~ prior
c ~ prior
for (t in 1:n_time) {
b_time[t] ~ prior
for (i in 1:n_stations) {
A[i,t] <- dpois(a[i,t])
log(a[i,t]) <- b1*b_time[t]*X1[i,t] + b2*b_time[t]*X2[i,t]+ c[i]
}}
Related
I'm new to applying splines to longitudinal data, so here comes my question:
I've some longitudinal data on growing mice in 3 timepoints: at x, y and z months. It's known from the existent literature that the trajectories of growth in this type of data are usually better modeled in non-linear terms.
However, since I have only 3 timepoints, I wonder if this allows me to apply natural quadratic spline to age variable in my lmer model?
edit:I mean is
lmer<-mincLmer(File ~ ns(Age,2) * Genotype + Sex + (1|Subj_ID),data, mask=mask)
a legit way to go around?
I'm sorry if this is a stupid question - I'm just a lonely PhD student without supervision, and I would be super-grateful for any advice!!!
Marina
With the nls() function you can fit your data to whatever non-linear function you want. Then, from the biological point of view, probably your data is described by a Gompertz-like function (sigmoidal), but as you have only three time points, probably you can simplify these kind of functions into an exponential one. Try the following:
fit_formula <- independent_variable ~ a * exp(b * dependent_variable)
result <- nls(formula = fit_formula, data = your_Dataset)
It will probably give you an error the first times, something like singular matrix gradient at initial estimates ; if this happens, try adding the additional parameter start, where you provide different starting values for a and b more close to the true values. Remember that in your dataset, the column names must be equal to the names of the variables in the formula.
I'm attempting to do this in JAGS:
z[l] ~ dbeta(0.5,0.5)
y[i,l] ~ z[l]*dnorm(0,10000) + inprod(1-z[l],dnegbin(exp(eta_star[i,l]),alpha[l]))
(dnorm(0,10000) models a Dirac delta in 0: see here if you are interested in the model).
But I get:
RUNTIME ERROR:
Incorrect number of arguments in function dnegbin
But if I do this:
y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
It runs just fine. I wonder that I cannot multiply a value for a distribution, so I imagine that something like this could work:
z[l] ~ dbeta(0.5,0.5)
pointmass_0[l] ~ dnorm(0,10000)
y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
y_star[i,l] = z[l]*pointmass_0[l]+inprod(1-z[l],y[i,l])
If I run that I get:
ystar[1,1] is a logical node and cannot be observed
You are looking to model a zero-inflated negative binomial model. You can do this in JAGS if you use the "ones trick", an pseudo-likelihood method that can be used when the distribution of your outcome variables is not one of the standard distributions in JAGS but you can still write down an expression for the likelihood.
The "ones trick" consists of creating pseudo-observations with the value 1. These are then modeled as Bernoulli random variables probability parameter Lik/C where Lik is the likelihood of your observations and C is a large constant to ensure that Lik/C << 1.
data {
C <- 10000
for (i in 1:N) {
one[i,1] <- 1
}
}
model {
for (i in 1:N) {
one[i,1] ~ dbern(lik[i,1]/C)
lik[i,1] <- (y[i,1]==0)*z[1] + (1 - z[1]) * lik.NB[i,1]
lik.NB[i,1] <- dnegbin(y[i,1], exp(eta_star[i,1]), alpha[1])
}
z[l] ~ dbeta(0.5,0.5)
}
Note that the name dnegbin is overloaded in JAGS. There is a distribution that has two parameters and a function that takes three arguments and returns the likelihood. We are using the latter.
I am thinking of adding zero-inflated versions of count distributions to JAGS, since the above construction is quote awkward for the user, whereas zero-inflated distributions are quite easy to implement internally in JAGS.
I too would like to know a better way to handle this situation.
One cheesy solution is to add a stochastic node
ystarstar[i,j] ~ dnorm(ystar[i,j],10000000)
(i.e. a Normal distribution with a very high precision, or a Dirac delta in your terminology) to the model.
ref: http://www.r-bloggers.com/r-for-ecologists-putting-together-a-piecewise-regression/
In this paper, I am confused about this argument:
y ~ x*(x < breaks[i]) + x*(x>=breaks[i])
in lm().
I know * in lm means interactions and main effects so does this mean that predictors are x x (x < breaks[i]) (x < breaks[i]) and interactions?
This is a method of doing "segmented" regression. You are essentially creating two different models, one for the section where x < breaks[i] and another where the opposite is true. In this case the * will be functioning as a multiplier rather than as an interaction operator because the values are {0,1} so there won't be a two level result. The webpage seems to do a pretty nice job of illustrating this, so it's unclear what is missing. The model formula might be more clear if it were written as:
y ~ x*I(x < breaks[i]) + x*I(x>=breaks[i])
It essentially means that there are two predictors: the first one being x and the second one being a logical vector that is 1 in the region less than breaks[i] and 0 in the other region. In fact you probably would not need two terms in the model if you just used:
y ~ x*I(x < breaks[i])
I thought the predictions would be the same, but they were slightly different, perhaps because the two term model implicitly allowed completely independent intercepts.
There also are segmented and strucchange packages that support segmented regression.
I have some data from a poisson distribution and have a simple equation I want to solve using glm.
The mathematical equation is observed = y * expected.
I have the observed and expected data and want to use glm to find the optimal value of y which I need to multiply expected by to get observed. I also want to get confidence intervals for y.
Should I be doing something like this
glm(observed ~ expected + offset(log(expected)) + 0, family = 'poisson', data = dataDF)
Then taking the exponential of the coefficient? I tried this but the value given is pretty different to what I get when I divide the sum of the observed by the sum of the expected, and I thought these should be similar.
Am I doing something wrong?
Thanks
Try this:
logFac <- coef( glm(observed ~ offset(expected) , family = 'poisson', data = dataDF))
Fac <- exp( logFac[1] ) # That's the intercept term
That model is really : observed ~ 1 + offset(expected) and since it's being estimated on a log scale, the intercept becomes that conversion factor to convert between 'expected' and 'observed'. The negative comments are evidence that you should have posted on CrossValidated.com where general statistics methods questions are more welcomed.
I've run an anova using the following code:
aov2 <- aov(amt.eaten ~ salt + Error(bird / salt),data)
If I use view(aov2) I can see the residuals within the structure of aov2, but I would like to extract them in a way that doesn't involve cutting and pasting. Can someone help me out with the syntax?
Various versions of residuals(aov2) I have been using only produce NULL
I just learn that you can use proj:
x1 <- gl(8, 4)
block <- gl(2, 16)
y <- as.numeric(x1) + rnorm(length(x1))
d <- data.frame(block, x1, y)
m <- aov(y ~ x1 + Error(block), d)
m.pr <- proj(m)
m.pr[['Within']][,'Residuals']
The reason that you cannot extract residuals from this model is that you have specified a random effect due to the bird salt ratio (???). Here, each unique combination of bird and salt are treated like a random cluster having a unique intercept value but common additive effect associated with a unit difference in salt and the amount eaten.
I can't conceive of why we would want to specify this value as a random effect in this model. But in order to sensibly analyze residuals, you may want to calculate fitted differences in each stratum according to the fitted model and optimal intercept. I think this is tedious work and not very informative, however.