I'm trying to add a bit of code to a data-augmentation capture-recapture model and am coming up with some errors I haven't encountered before. In short, I want to estimate a series of survivorship phases that each last more than a single time interval. I want the model to estimate the length of each survivorship phase and use that to improve the capture recapture model. I tried and failed with a few different approaches, and am now trying to accomplish this using a switching state array for the survivorship phases:
for (t in 1:(n.occasions-1)){
phi1switch[t] ~ dunif(0,1)
phi2switch[t] ~ dunif(0,1)
phi3switch[t] ~ dunif(0,1)
phi4switch[t] ~ dunif(0,1)
psphi[1,t,1] <- 1-phi1switch[t]
psphi[1,t,2] <- phi1switch[t]
psphi[1,t,3] <- 0
psphi[1,t,4] <- 0
psphi[1,t,5] <- 0
psphi[2,t,1] <- 0
psphi[2,t,2] <- 1-phi2switch[t]
psphi[2,t,3] <- phi2switch[t]
psphi[2,t,4] <- 0
psphi[2,t,5] <- 0
psphi[3,t,1] <- 0
psphi[3,t,2] <- 0
psphi[3,t,3] <- 1-phi3switch[t]
psphi[3,t,4] <- phi3switch[t]
psphi[3,t,5] <- 0
psphi[4,t,1] <- 0
psphi[4,t,2] <- 0
psphi[4,t,3] <- 0
psphi[4,t,4] <- 1-phi4switch[t]
psphi[4,t,5] <- phi4switch[t]
psphi[5,t,1] <- 0
psphi[5,t,2] <- 0
psphi[5,t,3] <- 0
psphi[5,t,4] <- 0
psphi[5,t,5] <- 1
}
So this creates a [5,t,5] array where the survivorship state can only switch to the subsequent state and not backwards (e.g. 1 to 2, 4 to 5, but not 4 to 3). Now I create a vector where the survivorship state is defined:
PhiState[1] <- 1
for (t in 2:(n.occasions-1)){
# State process: draw PhiState(t) given PhiState(t-1)
PhiState[t] ~ dcat(psphi[PhiState[t-1], t-1,])
}
We start in state 1 always, and then take a categorical draw at each time step 't' for remaining in the current state or moving on to the next one given the probabilities within the array. I want a maximum of 5 states (assuming that the model will be able to functionally produce fewer by estimating the probability of moving from state 3 to 4 and onwards near 0, or making the survivorship value of subsequent states the same or similar if they belong to the same survivorship value in reality). So I create 5 hierarchical survival probabilities:
for (a in 1:5){
mean.phi[a] ~ dunif(0,1)
phi.tau[a] <- pow(phi_sigma[a],-2)
phi.sigma[a] ~ dunif(0,20)
}
Now this next step is where the errors start. Now that I've assigned values 1-5 to my PhiState vector it should look something like this:
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 5
or maybe
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
and I now want to assign a mean.phi[] to my actual phi[] term, which feeds into the model:
for(t in 1:(n.occasions-1)){
phi[t] ~ dnorm(mean.phi[PhiState[t]],phi.tau[PhiState[t]])
}
However, when I try to run this I get the following error:
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, :
RUNTIME ERROR:
Cannot insert node into mean.phi[1:5]. Dimension mismatch
It's worth noting that the model works just fine when I use the following phi[] determinations:
phi[t] ~ dunif(0,1) #estimate independent annual phi's
or
phi[t] ~ dnorm(mean.phi,phi_tau) #estimate hierarchical phi's from a single mean.phi
or
#Set fixed survial periods (this works the best, but I don't want to have to tell it when
#the periods start/end and how many there are, hence the current exercise):
for (a in 1:21){
surv[a] ~ dnorm(mean.phi1,phi1_tau)
}
for (b in 22:30){
surv[b] ~ dnorm(mean.phi2,phi2_tau)
}
for (t in 1:(n.occasions-1)){
phi[t] <- surv[t]
}
I did read this post: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/36c48f25/
but I don't see where I'm redefining variables in this case... Any help fixing this or advice on a better approach would be most welcome!
Many thanks,
Josh
I'm a bit confused as to what are your actual data (the phi[t]?), but the following might give you a starting point:
nt <- 29
nstate <- 5
M <- function() {
phi_state[1] <- 1
for (t in 2:nt) {
up[t-1] ~ dbern(p[t-1])
p[t-1] <- ifelse(phi_state[t-1]==nstate, 0, p_[t-1])
p_[t-1] ~ dunif(0, 1)
phi_state[t] <- phi_state[t-1] + equals(up[t-1], 1)
}
for (k in 1:nstate) {
mean_phi[k] ~ dunif(0, 1)
phi_sigma[k] ~ dunif(0, 20)
}
for(t in 1:(nt-1)){
phi[t] ~ dnorm(mean_phi[phi_state[t]], phi_sigma[phi_state[t]]^-2)
}
}
library(R2jags)
fit <- jags(list(nt=nt, nstate=nstate), NULL,
c('phi_state', 'phi', 'mean_phi', 'phi_sigma', 'p'),
M, DIC=FALSE)
Note that above, p is a vector of probabilities of moving up to the next (adjacent) state.
Related
I try to estimate a nested logit model as a sequence of MNLs for teaching purposes. At some point of this procedure (see Ben Akiva & Lerman, 1986, Ch. 10, 10.4) one estimates jointly MNLs representing the nests. The MNLs (i.e., the "nests") are represented by the corresponding choice sets.
However when estimating the model, I come across an error message due to singularity.
I tried the code below.
library(mlogit)
data("Fishing", package = "mlogit")
# nest1: beach & pier -> model 1
# nest2: boat & charter -> model 2
# joint estimation of model 1 and 2
# Ben Akiva & Lerman (1986), Chp. 10 (.4)
# availabilities according to models
Fishing$avail.beach <- 0
Fishing$avail.pier <- 0
Fishing$avail.boat <- 0
Fishing$avail.charter <- 0
Fishing$avail.charter[Fishing$mode == "charter" | Fishing$mode == "boat"] <- 1
Fishing$avail.boat[Fishing$mode == "charter" | Fishing$mode == "boat"] <- 1
Fishing$avail.pier[Fishing$mode == "pier" | Fishing$mode == "beach"] <- 1
Fishing$avail.beach[Fishing$mode == "pier" | Fishing$mode == "beach"] <- 1
names(Fishing)
Fish <- mlogit.data(Fishing, shape="wide", varying=c(2:9, 11:14), choice="mode")
fish.joint <- subset(Fish, avail ==1) # eliminate rows of non-available alts
head(model.matrix(mFormula(mode ~ price), fish.joint), n=4)
fish.1 <- mlogit(mode ~ price, fish.joint)
To me, the model.matrix looks identifiable
> head(model.matrix(mFormula(mode ~ price), fish.joint), n=4)
boat:(intercept) charter:(intercept) pier:(intercept) price
1.boat 1 0 0 157.930
1.charter 0 1 0 182.930
2.boat 1 0 0 10.534
2.charter 0 1 0 34.534
However,
> fish.1 <- mlogit(mode ~ price, fish.joint)
Error in solve.default(H, g[!fixed]) :
Lapack routine dgesv: system is exactly singular: U[2,2] = 0
I want to create a custom contrast function in emmeans which could remove a given list of levels from the input vector and apply the built-in contrast method ("trt.vs.ctrl") on the remaining levels. An example dataset is available here. I am using the following R code for computing ANOVA and post hoc comparisons:
options(contrasts=c("contr.sum", "contr.poly"))
my_lm <- lm(D1 ~ C*R, data=df)
Anova(my_lm, type = "III")
#show Interaction effects using emmeans
emmip(my_lm, C ~ R )
emm = emmeans(my_lm, ~ C * R)
emm
contrast(emmeans(my_lm, ~ C * R), "consec", by = "C")
#compare 1st with next 3 groups (how to remove other three levels?)
contrast(emmeans(my_lm, ~ C * R), "trt.vs.ctrl", by = "R")
The built-in contrast option ("trt.vs.ctrl") compares the first level with everything that follows it (there are 7 factor levels in C, and I want to remove last 3 of them and compute the contrasts for the remaining 4). An example is provided in the official documentation to write a custom contrast function.
skip_comp.emmc <- function(levels, skip = 1, reverse = FALSE) {
if((k <- length(levels)) < skip + 1)
stop("Need at least ", skip + 1, " levels")
coef <- data.frame()
coef <- as.data.frame(lapply(seq_len(k - skip - 1), function(i) {
sgn <- ifelse(reverse, -1, 1)
sgn * c(rep(0, i - 1), 1, rep(0, skip), -1, rep(0, k - i - skip - 1))
}))
names(coef) <- sapply(coef, function(x)
paste(which(x == 1), "-", which(x == -1)))
attr(coef, "adjust") = "fdr" # default adjustment method
coef
}
However due to my limited understanding I am not very sure where to apply the modifications that I need to to customise the example. Any ideas?
Is this something you are going to want to do lots of times in the future? My guess is not, that you only want to do this once, or a few times at most; in which case it is way too much trouble to write a custom contrast function. Just get the contrast coefficients you need, and use that as the second argument in contrast.
Now, consider these results:
> con <- emmeans:::trt.vs.ctrl.emmc(1:7)
> con
2 - 1 3 - 1 4 - 1 5 - 1 6 - 1 7 - 1
1 -1 -1 -1 -1 -1 -1
2 1 0 0 0 0 0
3 0 1 0 0 0 0
4 0 0 1 0 0 0
5 0 0 0 1 0 0
6 0 0 0 0 1 0
7 0 0 0 0 0 1
From the description, I think you just want the first 3 sets of contrast coefficients. So use those columns:
contrast(emm, con[, 1:3], by = "R")
Update
StackOverflow can occasionally inspire developers to add software features. In this case, I decided it could be useful to add an exclude argument to most built-in .emmc functions in emmeans (all except poly.emmc()). This was fairly straightforward to do, and those features are now incorporated in the latest push to github -- https://github.com/rvlenth/emmeans. These features will be included in the next CRAN update as well.
I'm writing my master thesis and I'm stuck with the complexity of my data. Therefore I'd like to plot my data to see what's in there.
My dataframe looks like that: I've 333 perceivers (PID) who rated 60 target photos (TID) each, resulting in 19980 rows. Each perceiver (PID) rated every target's photo on how likeable they are (Rating) and provided multiple self-reports about themselves (SDO_mean, KSA_mean, threats_overall).
The photos were either from photo type A (Dwithin = 0) or type B (Dwithin = 1), which is my within-subject factor as every perceiver saw all photos. In addition perceivers were assigned to one of two between-subject condition (Dbetween): All photos (TID) from type B (Dwithin = 1) were labeled either as people with migration background (Dbetween = 0) or as refugees (Dbetween = 1).
This results in a nested design where the Ratings are nested in the PID and also in the TID. My data looks like that:
TID PID Dwithin Dbetween Rating SDO_mean KSA_mean threats_overall
1 1 0 0 5 3.1 2.3 2.2
2 1 1 0 2 3.1 2.3 2.2
3 1 0 0 5 3.1 2.3 2.2
4 1 1 0 1 3.1 2.3 2.2
5 1 0 0 3 3.1 2.3 2.2
6 1 1 0 3 3.1 2.3 2.2
Now I want to predict the likeable-rating mainly by the categorial variables Dwithin and Dbetween. As Dbetween can only be interpreted as an interaction of Dwithin*Dbetween (because the label was only for Dwitihn=1 targets), the formula would be:
model1 <- lmer(Rating~1+Dwithin+Dbetween+Dwithin*Dbetween+(1+Dwithin|PID)+(1|TID),data=df)
Now I want to plot the data which I'm using for my regression. An option could be to plot the Rating seperately for each Dwithin / Dbetween condition. Or to plot the regression as in the model1 formula. But as these are categorial predictors, I didn't manage to plot the data in the right way. I looked into lattice() but couldn't apply it on my data. Is there anyone who could help me plotting it? Thanks a lot in advance!
#SASpencer: I thought for example of something like this. But my y-scale isn't continious... it only has integer numbers from 1-5.It could also be interesting for the combination of Dwithin and Dbetween (so like in your plot)
Here is a reproducible example:
mysamp <- function(n, m, s, lwr, upr, nnorm) {
set.seed(1)
samp <- rnorm(nnorm, m, s)
samp <- samp[samp >= lwr & samp <= upr]
if (length(samp) >= n) {
return(sample(samp, n))
}
}
options(digits=2)
TID <- rep(1:60, times=333)
PID <- rep(1:333,each=60)
Dwithin <- rep(rep(0:1, times=19980/2))
Dbetween <- rep(rep(0:1, each=60),times=333)[1:19980]
Rating <- floor(runif(19980, min=1, max=6))
SDO_mean <- rep(mysamp(n=333, m=4, s=2.5, lwr=1, upr=5, nnorm=1000000), each=60)
KSA_mean <- rep(mysamp(n=333, m=2, s=0.8, lwr=1, upr=5, nnorm=1000000), each=60)
threats_overall <- rep(mysamp(n=333, m=3, s=1.5, lwr=1, upr=5, nnorm=1000), each=60)
df <- data.frame(TID,PID,Dwithin,Dbetween, Rating, SDO_mean, KSA_mean, threats_overall)
I am trying to build a logistic regression model using glm function in R. My dependent variable is binomial with 0 and 1 only. Here 0 - Non Return , 1- Return.
I want to model for Non-Return (0's),but glm function of R by default build for 1's. Like in SAS which by default build for lower value and we can use descending attribute in proc logistic to change the order, do we have something similar in glm too ?
I have one option to achieve this by changing 0 to 1 and vice-versa in my raw data but don't want to change my raw data.
Please help me or guide how can I do the similar thing in R.
Thanks in advance.
Just specify 1 - y as the DV:
set.seed(42)
y <- sample(c(0, 1), 10, TRUE)
#[1] 1 1 0 1 1 1 1 0 1 1
fit <- glm(y ~ 1, family = binomial)
coef(fit)
# (Intercept)
# 1.386294
log(mean(y) / (1 - mean(y)))
#[1] 1.386294
1 - y
#[1] 0 0 1 0 0 0 0 1 0 0
fit1 <- glm(1 - y ~ 1, family = binomial)
coef(fit1)
#(Intercept)
#-1.386294
log(mean(1 - y) / (1 - mean(1 - y)))
#[1] -1.386294
Alternatively, you can temporarily transform your data by using...transform:
glm( data = transform( data.frame(y=0), y=y+1 ), ... )
I am trying to conduct an hierarchical bayesian analysis but am having a little trouble with R and WinBUGS code. I don't have balanced data and am struggling with the coding. I have temperature data collected daily with iButtons (temperature recording devices) in transects and am trying to generate a model that relates this to remote sensing data. Unfortunately, each transect has a different number of iButtons so creating a 3D matrix of button(i), in transect(j), repeatedly "sampled" on day(t) is a problem for me.
Ultimately, my model will be something like:
Level 1
Temp[ijk] ~ N(theta[ijk], tau)
theta[ijk] = b0 + b1*x1 + . . . + bn*xn
Level 2
b0 = a00 + a01*y1 + . . . an*yn
b1 = a10 + a11*y1 ...
Level 3 (maybe?) - random level 2 intercepts
Normally I would do something like this:
Wide <- reshape(Data1, idvar = c("iButton","block"), timevar = "julian", direction = "wide")
J <- length(unique(Data$block))
I <- length(unique(Data$iButton))
Ti <- length(unique(Data$julian))
Temp <- array(NA, dim = c(I, Ti, J))
for(t in 1:Ti) {
sel.rows <- Wide$block == t
Temp[,,t] <- as.matrix(Wide)[sel.rows, 3:Ti]
}
Then I could have a 3D matrix that I could loop through in WinBUGS or OpenBUGS as such:
for(i in 1:J) { # Loop over transects/blocks
for(j in 1:I) { # Loop over buttons
for(t in 1:Ti) { # Loop over days
Temp[i,j,t] ~ dnorm(theta[i,j,t])
theta[i,j,t] <- alpha.lam[i] + blam1*radiation[i,j] + blam2*cwd[i,j] + blam3*swd[i,j]
}}}
Anyway, don't worry about the details of the code above, it's just thrown together as an example from other analyses. My main question is how to do this type of analysis when I don't have a balanced design with equal numbers of iButtons per transect? Any help would be greatly appreciated. I'm clearly new to R and WinBUGS and don't have much previous computer coding experience.
Thanks!
oh and here is what the data look like in long (stacked) format:
> Data[1:15, 1:4]
iButton julian block aveT
1 1 1 1 -4.5000000
2 1 2 1 -5.7500000
3 1 3 1 -3.5833333
4 1 4 1 -4.6666667
5 1 5 1 -2.5833333
6 1 6 1 -3.0833333
7 1 7 1 -1.5833333
8 1 8 1 -8.3333333
9 1 9 1 -5.0000000
10 1 10 1 -2.4166667
11 1 11 1 -1.7500000
12 1 12 1 -3.2500000
13 1 13 1 -3.4166667
14 1 14 1 -2.0833333
15 1 15 1 -1.7500000
Create a vector or array of lengths and use subindexing.
Using your example:
J <- length(unique(Data$block))
I <- tapply(Data$iButton, Data$block, function(x) length(unique(x))
Ti <- tapply(Data$julian, list(Data$iButton, Data$block), function(x) length(unique(x))
for(i in 1:J) { # Loop over transects/blocks
for(j in 1:I[i]) { # Loop over buttons
for(t in 1:Ti[i, j]) { # Loop over days
Temp[i,j,t] ~ dnorm(theta[i,j,t])
theta[i,j,t] <- alpha.lam[i] + blam1*radiation[i,j] + blam2*cwd[i,j] + blam3*swd[i,j]
}}}
I think it would work, but I haven't tested since there no data.
Can you try using a list instead?
This allows a variable length for each item in the list where each index would correspond to the transect.
So something like this:
theta <- list()
for(i in unique(Data$block)) {
ibuttons <- unique(Data$iButton[Data$block==i])
days <- unique(Data$julian[Data$block==i])
theta[[i]] <- matrix(NA, length(ibuttons), length(days)) # Empty matrix with NA's
for(j in 1:length(ibuttons)) {
for(t in 1:length(days)) {
theta[[i]][j,t] <- fn(i, ibuttons[j], days[t])
}
}
}