Bernoulli GLM in JAGS - r

I am getting errors of mismatched parameters for one of the predictor variables in my JAGS code and this is not producing convergence for 2 of my variables.
I am analyzing a data set of 1000 patients (observations) who had epilepsy and those who did not (1/0). Other variables are categorical - gender (male, female, LGBTQ), drug type use (diclofenac (1), bicarbonate (2) and thiazide (3)). Then, there is a continuous variable (BMI, ranging from 10:40). Below is my code. What could be wrong with it:
"model{
for (i in 1:n)
{
epilepsy[i]~dbern(prob[i])
logit(prob[i])<- intercept+gendereffect[gender[i]] + druguseeffect[Druguse[i]]+ BMIeffect*BMI[i]+
interaction*gender[i]*Druguse[i]}
intercept~dnorm(0,precision)
gendereffect[1]<-0
gendereffect[2]~dnorm(0,10^6)
gendereffect[3]~dnorm(0,10^6)
druguseeffect[1]<-0
druguseeffect[2] ~ dnorm(0,10^6)
druguseeffect[3]~dnorm(0,10^6)
intercept~dbeta(1,1)
BMIeffect~dnorm(0,10^6)
interaction~dnorm(0,10^6)
precision~dgamma(10,10)
#data# druguse, gender, BMI,n,epilepsy
#inits# intercept,BMIeffect,gendereffect,druguseeffect,interaction,precision
#monitor# gendereffect,BMIeffect,druguseeffect,interaction,precision
}"
Note: I have a list created. And on R, I converted druguse and gender to integers, both ranging 1:3. But the model is not running.

Related

Using predict in metafor when each author has multiple rows in the data

I'm running a meta-analysis where I'm interested in the effect of X on the effect of age on habitat use (raw mean values and variances) using the metafor package.
An example of one of my models is:
mod6 <-
rma.mv(
yi = Used_value,
V = Used_variance,
slab = Citation,
mods = ~ Age + poly(Slope, degrees = 2),
random = ~ 1 | Region,
data = vel.focal,
method = "ML"
)
My justification for not using Citation as a random effect is that using only Region accounts for more of the heterogeneity than when random = list( ~ 1 | Citation/ID, ~ 1 | Region) or when Citation/ID is used by itself.
What I need for output is the prediction for each age by region, but the predict() function for the model and the associated forest plot spits out the prediction for each row, as it assumes each row in the data is a unique study. In my case it is not as I have my input values separated by age and season.
predict(mod6)
pred se ci.lb ci.ub pi.lb pi.ub
Riehle and Griffith 1993.1 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Riehle and Griffith 1993.2 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Riehle and Griffith 1993.3 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Spina 2000.1 8.7706 2.7386 3.4030 14.1382 -0.7364 18.2776
Spina 2000.2 8.5407 2.7339 3.1824 13.8991 -0.9611 18.0426
Spina 2000.3 8.5584 2.7406 3.1868 13.9299 -0.9509 18.0676
Vondracek and Longanecker 1993.1 12.6116 2.5138 7.6847 17.5385 3.3462 21.8769
Vondracek and Longanecker 1993.2 12.6116 2.5138 7.6847 17.5385 3.3462 21.8769
Vondracek and Longanecker 1993.3 12.3817 2.5327 7.4176 17.3458 3.0965 21.6669
Vondracek and Longanecker 1993.4 12.3817 2.5327 7.4176 17.3458 3.0965 21.6669
Does anybody know a way to modify the arguments inside predict() to tell it how you want your predictions output or to tell it that there are multiple rows per slab?
You need to use the newmods argument to specify the values for Age for which you want predicted values. You will have to plug in something for the linear and quadratic terms for the Slope variable as well (e.g., holding Slope constant at its mean and hence the quadratic term will just be the mean squared). Region is not a fixed effect, so it is not relevant if you want to compute predicted values based on the fixed effects. If you want to compute BLUPs for those random effects, you can do so with ranef(). One can then combine the predictions based on the fixed effects with the BLUPs. That would be the general idea, but implementing this will require a bit of programming.

I am working on ordered Logit. Tried to solve the estimates and proportional odds using R. Is it correct

Question: Have a look at data set Two.csv. It contains a potentially dependent binary variable Y , and
two potentially independent variables {X1, X2} for each unit of measurement.
(a) Read data set Two.csv into R and have a look at the format of the dependent variable.
Discuss three models which might be appropriate in this data situation. Discuss which
aspects speak in favor of each model, and which aspects against.
(b) Suppose variable Y measures financial ratings A : y = 1, B : y = 2, and C : y = 3, that
is, the creditworthiness A: high, B: intermediate, C: low for unit of measurement firm
i. Model Y by means of an ordered Logit model as a function of {X1,X2} and estimate
your model by means of a built-in command.
(c) Explain the proportional odds-assumption and test whether the assumption is critical
in the context of the data set at hand.
##a) Read data set Two.csv into R and have a look at the format of the dependent variable.
O <- read.table("C:/Users/DELL/Downloads/ExamQEIII2021/Two.csv",header=TRUE,sep=";")
str(O)
dim(O)
View(O)
##b)
library(oglmx)
ologit<- oglmx(y~x1+x2,data=O, link="logit",constantMEAN = FALSE, constantSD = FALSE,
delta=0,threshparam =NULL)
results.ologis <- ologit.reg(y~x1+x2,data=O)
summary(results.ologis)
## x1 1.46251
## x2 -0.45391
margins.oglmx(results.ologis,ascontinuous = FALSE) #Build in command for AMElogit
##c) Explain the proportional odds-assumption and test whether the assumption is critical
#in the context of the data set at hand.
#ordinal Logit WITH proportional odds(PO)
library(VGAM)
a <- vglm(y~x1+x2,family=cumulative(parallel=TRUE),data=O)
summary(a)
#ordinal Logit WITHOUT proportional odds [a considers PO and c doesn't]
c <- vglm(y~x1+x2,family=cumulative(parallel=FALSE),data=O)
summary(c)
pchisq(deviance(a)-deviance(c),df.residual(a)-df.residual(c),lower.tail=FALSE)
## 0.4936413 ## No significant difference in the variance left unexplained. Cannot
#confirm that PO assumption is critical.
#small model
LLa <- logLik(a)
#large model
LLc <- logLik(c)
#2*LLc-2*
df.residual(c)
df.residual(a) #or similarly, via a Likelihood Ratio test.
# or, if you are unsure about the number of degrees of freedom
LL<- 2*(LLc -LLa)
1-pchisq(LL,df.residual(a)-df.residual(c))
## 0.4936413 [SAME AS ## No sign. line]
##Conclusion: Likelihood do not differ significantly with the assumption of non PO.

ivprobit error: the leading minor of order # is not positive definite

I am trying to run a ivprobit regression on individual crime and demographic variables. The dependent variable is annualassault and I have 24 independent variables regarding to the demographic. The r.h.s endogenous variable is sogs_total, and the instruments are drivetime and distance. I got the error: the leading minor of order 4 is not positive definite.
I reduced my independent variables to only 5 because I think I might be over fitting the model (too many variables), but I still get the same error.
library(ivprobit)
ivprobit.assault = ivprobit(annualassault ~ age +sex+ education_1+ education_2+
education_3+ education_4+ education_5+ education_6+ education_7
+maritalstatus_0+maritalstatus_1+maritalstatus_2+maritalstatus_3+maritalstatus_4+
a_nondm+d_nonam+m_nonad+ ad_nonm+am_nond
+dm_nona+ adm |sogs_total|drivetime + distance, newdata)
ivprobit.assault2 = ivprobit(annualassault ~ age +sex+ education_1+maritalstatus_1+
am_nond|sogs_total|drivetime + distance, newdata)
Error in chol.default(mat) :
the leading minor of order 4 is not positive definite

error : for loop - replacement has length zero

I am new to R and trying to do a coursework about factor analysis with it.
I have two data sets FundReturn(120 rows, 14 columns) and Factors(120 rows, 30 columns), I want to do a one-factor regression for all the possible pairs of factors and funds, starting with the first 60 observations. With the parameters estimated, I want to calculate the predicted value for the 61st fund return with the 61st value of the factor. Then the estimation window is expanded one observation bigger and new parameters are estimated with the updated sample, then the predicted value for 62rd fund return is calculated, so on so forth. Totally 60 predictions will be made, stored in Predictions=array(1,dim=c(60,30,14)), so I can compare them with the realized values.
The following is the code I used and produced this error:
Error in Predictions[p, fa, fu] <- coeff[1, p, fa, fu] + coeff[2, p, fa, :
replacement has length zero
Can anyone spot the problem? Your help is very appreciated.
Predictions=array(1,dim=c(60,30,14))
coeff=array(1,dim=c(3,60,30,14))
v1<- 1:30
v2<- 1:60
v3<- 1:14
for(fu in v3){
for (fa in v1){
for (p in v2){
y1=FundReturn[1:(59+p),fu]
x1=Factors[1:(59+p),fa]
Model<-lm(y1 ~ x1 + lag(y1))
coeff[1:3,p,fa,fu]=Model[["coefficients"]]
Predictions[p,fa,fu]= coeff[1,p,fa,fu]+coeff[2,p,fa,fu]*Factors[60+p,fa]+coeff[3,p,fa,fu]*FundReturn[59+p,fu]
}
}
}

why multinom() predicts a lot of rows of probabilities for each level of outcome?

I have a moltinomial logistic regression and the outcome variable has 6 levels: 10,20,60,70,80,90
test<-multinom(y ~ x1 + x2 + as.factor(x3) ,data=data1)
I want to predict the probabilities associate with each level of y for each set of given input values. So I run this:
dfin <- data.frame( ses = c(10,20,60,70,80,90), x1=2.1, x2=4, x3=40)
predict(test, todaydata = dfin, type = "probs")
But instead of getting 6 probabilities (one for each level of outcome), I got many many rows of probabilities. Each row has 6 probabilities (summation is 1) but I don't know why I get many rows and which row I should trust.
5541 7.226948e-01 1.498199e-01 8.086624e-02 1.253289e-02 8.799416e-03 2.528670e-02
5546 6.034188e-01 7.386553e-02 1.908132e-01 1.229962e-01 4.716406e-04 8.434623e-03
5548 7.266859e-01 1.278779e-01 1.001634e-01 2.032530e-02 7.156766e-03 1.779076e-02
5562 7.120179e-01 1.471181e-01 9.146071e-02 1.265592e-02 8.189511e-03 2.855781e-02
5666 6.645056e-01 3.034978e-02 1.687687e-01 1.219601e-01 3.972833e-03 1.044308e-02
5668 4.875966e-01 3.126855e-02 2.090006e-01 2.430828e-01 3.721631e-03 2.532970e-02
5670 3.900772e-01 1.305786e-02 1.803779e-01 4.137106e-01 1.314298e-03 1.462155e-03
5671 4.272971e-01 1.194599e-02 1.748494e-01 3.833422e-01 8.863019e-04 1.678975e-03
5674 5.477521e-01 2.587478e-02 1.650817e-01 2.487404e-01 3.368726e-03 9.182195e-03
5677 4.300207e-01 9.532836e-03 1.608679e-01 3.946310e-01 2.626104e-03 2.321351e-03
5678 4.542981e-01 1.220728e-02 1.410984e-01 3.885146e-01 2.670689e-03 1.210891e-03
5705 5.642322e-01 1.830575e-01 5.134181e-02 8.952808e-04 8.796467e-03 1.916767e-01
5706 6.161694e-01 1.094046e-01 1.979044e-01 1.095385e-02 7.254592e-03 5.831323e-02
....
Am I missing anything in coding or do I need to set any parameter?
It is returning the probability for the observation to be in each of the classes. That is how multinomial logistic regressions are implemented. You can imagine a series of binomial logistic regressions (one for each class) and then choosing the class that has the highest probability. This is called the one-v-all approach.
In your example, observation 5541 is predicted to be class 1 because the first column has the highest value (probability). Observation 5670 is class 4 because that's the column with the highest probability. The matrix will have dimensions # of observations x # of classes.

Resources