I'm new to JAGS, and I'm trying to run a simple logistic regression. My data file is very simple: the response is binary and the one predictor I'm using has three levels. Like this:
col1: 1 2 2 2 1 1 1 2 1 2 ...
col2: HLL, HLL, LHL, LLL, LHL, HLL ...
Dummy coding
The levels in col2 are HLL, LHL, LLL. I dummy coded it and created a data frame that looks like this:
(intercept) HLL LHL LLL
1 1 0 0 1
2 1 0 0 1
4 1 0 0 1
5 1 0 1 0
6 1 0 1 0
7 1 0 0 1
Data list
My data file (myList), then, looks like this:
List of 5
$ y : num [1:107881] 2 2 2 2 2 2 2 2 2 2 ...
$ N : num 500
$ HLL: num [1:107881] 0 0 0 0 0 0 0 0 0 0 ...
$ LHL: num [1:107881] 0 0 0 1 1 0 0 0 0 1 ...
$ LLL: num [1:107881] 1 1 1 0 0 1 1 1 1 0 ...
I'm using N=500 because the full data frame is huge and I just want to test it.
Model
cat(
"model {
for( i in 1 : N ){
y[i] ~ dbern(mu[i])
mu[i] <- 1/(1+exp(-(a + b*HLL[i] + c*LHL[i] + d*LLL[i])))
}
a ~ dnorm(0, 1.0e-12)
b ~ dnorm(0, 1.0e-12)
c ~ dnorm(0, 1.0e-12)
d ~ dnorm(0, 1.0e-12)
}", file = "model.txt"
)
Running model + error
model = jags.model(file = "model.txt",
data = myList,
n.chains = 3, n.adapt = 500)
Error I get
Error in jags.model(file = "model.txt", data = antPenList, n.chains = 3, :
Error in node y[1]
Node inconsistent with parents
The dbern distribution expects response in {0,1} rather than {1,2} as it seems you have coded it, so you need to subtract 1 from your values of y.
It is a bit strange that you get this error, as dbern does not usually give an error for other response values (it basically makes <0 = 0 and >1 = 1). The error is probably stemming from the fact that the response is fitting all the same value, but if that doesn't fix it then you could try the following:
1) Try increasing the precision of your priors for a/b/c/d slightly - a variance of 10^12 is quite a lot
2) Instead of:
mu[i] <- 1/(1+exp(-(a + b*HLL[i] + c*LHL[i] + d*LLL[i])))
You could write:
logit(mu[i]) <- -(a + b*HLL[i] + c*LHL[i] + d*LLL[i])
This might also help JAGS to recognise this as a GLM and initiate the appropriate samplers - remember to load the glm module.
3) Set some initial values for a/b/c/d that are vaguely consistent with your data (perhaps obtained using a fit with glm() in R)
I solved it with
mu[i] <- 1/(1.000001+exp(-(a + b*HLL[i] + c*LHL[i] + d*LLL[i])))
Related
I want to run a multinomial mixed effects model with the mclogit package of R.
Below can be show the head of my data frame.
> head(mydata)
ID VAR1 X1 Time Y other_X3 other_X4 other_X5 other_X6 other_X7
1 1 1 1 1 10 0 0 0 0 0
2 1 1 1 2 5 1 1 1 0 2
3 2 2 3 1 10 0 0 0 0 0
4 2 2 3 2 7 1 0 0 0 2
5 3 1 3 1 10 0 0 0 0 0
6 3 1 3 2 7 1 0 0 0 2
The Y variable is a categorical variable with 10 levels (1-10, is a score variable).
What I want is a model for y~x1+x2 by adding
random intercept effect (for ID variable) and random slope effect (for Time variable).
I try the following command by I got an error.
> mixed_model <- mclogit( cbind(Y, ID) ~ X1 + Time + X1*Time,
+ random = list(~1|ID, ~Time|ID), data = mydata)
Error in FUN(X[[i]], ...) :
No predictor variable remains in random part of the model.
Please reconsider your model specification.
In addition: Warning messages:
1: In mclogit(cbind(Y, ID) ~ X1 + Time + X1 * Time, random = list(~1 | :
removing X1 from model due to insufficient within-choice set variance
2: In FUN(X[[i]], ...) : removing intercept from random part of the model
because of insufficient within-choice set variance
Any idea about how to correct it ?
Thank you in advance.
I'm trying to understand some code that builds a model matrix in R but having trouble understanding some basic syntax.
Here's some reproducible code below:
test_df <- data.frame(category =c("Poetry", "Narrative Film", "Music"),
country=c("GB", "US", "US"), usd_goal_real=c(1534,30000,45000),
time_int = c(59, 60, 45), state=c(0,0,0)
)
test_df2 <- data.frame(model.matrix( ~ . -1, test_df))
test_df3 <- data.frame(model.matrix( ~ . , test_df))
What exactly is specified in the line test_df2 <- data.frame(model.matrix( ~ . -1, test_df)) ?
Specifically, what does the ~ . -1 mean? Is this excluding a field from the model? How does iI differ from the formula ~ . , in the next line?
The simplest answer is that the -1 in the formula in model.matrix removes the X intercept term from the model.
data.frame(model.matrix( ~ . -1, test_df)) produces:
categoryMusic categoryNarrative.Film categoryPoetry countryUS usd_goal_real time_int state
1 0 0 1 0 1534 59 0
2 0 1 0 1 30000 60 0
3 1 0 0 1 45000 45 0
and data.frame(model.matrix( ~ . , test_df)) produces:
X.Intercept. categoryNarrative.Film categoryPoetry countryUS usd_goal_real time_int state
1 1 0 1 0 1534 59 0
2 1 1 0 1 30000 60 0
3 1 0 0 1 45000 45 0
since there is a categorical variable in the model, you will also notice that the Music level of that variable disappears when there is an X intercept in the model since the first level of the variable is used for the intercept and all others are measured from that.
These are 2 different ways of parameterizing your model
#Here is my code:
library(MASS, caret, stepPlr, janitor)
#stepPlr: L2 penalized logistic regression with a stepwise variable selection
#MASS: Support Functions and Datasets for Venables and Ripley's MASS
#caret: Classification and Regression Training
#janitor: Simple Tools for Examining and Cleaning Dirty Data
#Howells is a main dataframe, we will segregate it.
HNORSE <- Howells[which(Pop=='NORSE'),]
#Let's remove NA cols
#We will use janitor package here to remove NA cols
HNORSE <- remove_empty_cols(HNORSE)
#Assigning 0's and 1's to females and males resp.
HNORSE$PopSex[HNORSE$PopSex=="NORSEF"] <- '0'
HNORSE$PopSex[HNORSE$PopSex=="NORSEM"] <- '1'
HNORSE$PopSex <- as.numeric(HNORSE$PopSex)
HNORSE$PopSex
#Resultant column looks like this
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1
[41] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
[81] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I want to use Stepplr from caret package
a <- step.plr(HNORSE[,c(6:76)], HNORSE$PopSex, lambda = 1e-4, cp="bic", max.terms = 1, trace = TRUE, type = "forward")
#Where HNORSE[,c(6:76)] --> features
#HNORSE$PopSex ---> Binary response
#lambda ----> Default value
#max.terms ---> I tried more than 1 value for max.terms, but then R goes into infinite loop of 'Convergence Error'.
#That's why using max.terms=1
Then I ran summary command on "a"
summary(a)
Call: plr(x = ix0, y = y, weights = weights, offset.subset = offset.subset,
offset.coefficients = offset.coefficients, lambda = lambda,
cp = cp)
Coefficients:Estimate Std.Error z value Pr(>|z|)
Intercept -71.93470 13.3521 -5.388 0
ZYB 0.55594 0.1033 5.382 0
Null deviance: 152.49 on 109 degrees of freedom
Residual deviance: 57.29 on 108 degrees of freedom
Score: deviance + 4.7 * df = 66.69
I used step.plr so, I should then use predict.stepplr right? and not predict.plr?
By this logic I wish to use predict.stepplr. The default function argument example goes like this:
n <- 100
p <- 5
x0 <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
x0 <- cbind(rnorm(n),x0)
y <- sample(c(0,1),n,replace=TRUE)
level <- vector("list",length=6)
for (i in 2:6) level[[i]] <- seq(3)
fit <- step.plr(x0,y,level=level)
x1 <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
x1 <- cbind(rnorm(n),x1)
pred1 <- predict(fit,x0,x1,type="link")
pred2 <- predict(fit,x0,x1,type="response")
pred3 <- predict(fit,x0,x1,type="class")
object: stepplr object
x: matrix of features used for fitting object.
If newx is provided, x must be provided as well.
newx: matrix of features at which the predictions are made.
If newx=NULL, predictions for the training data are returned.
type: If type=link, the linear predictors are returned;
if type=response, the probability estimates are returned; and
if type=class, the class labels are returned. Default is type=link.
...
other options for prediction..
So First of all, I did not do any sampling like shown in here.
I want to predict HNORSE$PopSex which is binary variable.
My dataset which does not include the binary variable column is HNORSE[,c(6:76)].
I want to know what x0 and x1 function arguments should I put in
predict.stepplr()?
If not, HOW do I correctly implement
predict.stepplr?
I want to use overall accuracy to plot(Density(overall_accuracy))
I have this data:
> head(k2)
fromTo adja sims
1 1 - 1 0 0
2 2 - 1 1 3
3 3 - 1 0 2
4 4 - 1 0 5
5 5 - 1 0 3
6 6 - 1 1 3
where adja is the class that needs to be predicted (0 or 1) and sims is the main factor that the prediction is based on.
I am using the e1071 library in R, to run the svm on this data. I have the following script:
s<- subset(k2, select=-adja)
t <- subset(k2, select=adja)
svm_model <- svm(adja ~ ., data=k2, type='C-classification')
summary(svm_model)
pred <- predict(svm_model,s)
table(pred,t$adja)
svm_tune <- tune.svm(adja ~ sims, data=k2, cost=10^(-1:2), gamma=(0.1:2.5))
svm_modeltuned=svm(adja ~ sims, data=k2, type='C-classification', kernel = "radial", cost = 100, gamma=1)
p <- predict(svm_modeltuned,s)
table(p,t$adja)
If I print p, all the predictions are 0:
> table(p)
p
0 1
9740 0
What am I doing wrong?
Thanks
I would like to run the dependent variable of a logistic regression (in my data set it's : dat$admit) with all available variables, each regression with its own Independent variable vs dependent variable.
The outcome that I wanted to get is a list of each regression summary. Using the data set submitted below there should be 3 regressions.
Here is a sample data set (where admit is the logistic regression dependent variable) :
dat <- read.table(text = "
+ female apcalc admit num
+ 0 0 0 7
+ 0 0 1 1
+ 0 1 0 3
+ 0 1 1 7
+ 1 0 0 5
+ 1 0 1 1
+ 1 1 0 0
+ 1 1 1 6",
+ header = TRUE)
I got an example for simple linear regression but When i tried to change the function from lm to glm I got "list()" as a result.
Here is the original code - for the iris dataset where "Sepal.Length" is the dependent variable :
sapply(names(iris)[-1],
function(x) lm.fit(cbind(1, iris[,x]), iris[,"Sepal.Length"])$coef)
How can I create the right function for a logistic regression?
dat <- read.table(text = "
female apcalc admit num
0 0 0 7
0 0 1 1
0 1 0 3
0 1 1 7
1 0 0 5
1 0 1 1
1 1 0 0
1 1 1 6",
header = TRUE)
This is perhaps a little too condensed, but it does the job.
Of course, the sample data set is too small to get any sensible
answers ...
t(sapply(setdiff(names(dat),"admit"),
function(x) coef(glm(reformulate(x,response="admit"),
data=dat,family=binomial))))