I want to run a multinomial mixed effects model with the mclogit package of R.
Below can be show the head of my data frame.
> head(mydata)
ID VAR1 X1 Time Y other_X3 other_X4 other_X5 other_X6 other_X7
1 1 1 1 1 10 0 0 0 0 0
2 1 1 1 2 5 1 1 1 0 2
3 2 2 3 1 10 0 0 0 0 0
4 2 2 3 2 7 1 0 0 0 2
5 3 1 3 1 10 0 0 0 0 0
6 3 1 3 2 7 1 0 0 0 2
The Y variable is a categorical variable with 10 levels (1-10, is a score variable).
What I want is a model for y~x1+x2 by adding
random intercept effect (for ID variable) and random slope effect (for Time variable).
I try the following command by I got an error.
> mixed_model <- mclogit( cbind(Y, ID) ~ X1 + Time + X1*Time,
+ random = list(~1|ID, ~Time|ID), data = mydata)
Error in FUN(X[[i]], ...) :
No predictor variable remains in random part of the model.
Please reconsider your model specification.
In addition: Warning messages:
1: In mclogit(cbind(Y, ID) ~ X1 + Time + X1 * Time, random = list(~1 | :
removing X1 from model due to insufficient within-choice set variance
2: In FUN(X[[i]], ...) : removing intercept from random part of the model
because of insufficient within-choice set variance
Any idea about how to correct it ?
Thank you in advance.
Related
#Here is my code:
library(MASS, caret, stepPlr, janitor)
#stepPlr: L2 penalized logistic regression with a stepwise variable selection
#MASS: Support Functions and Datasets for Venables and Ripley's MASS
#caret: Classification and Regression Training
#janitor: Simple Tools for Examining and Cleaning Dirty Data
#Howells is a main dataframe, we will segregate it.
HNORSE <- Howells[which(Pop=='NORSE'),]
#Let's remove NA cols
#We will use janitor package here to remove NA cols
HNORSE <- remove_empty_cols(HNORSE)
#Assigning 0's and 1's to females and males resp.
HNORSE$PopSex[HNORSE$PopSex=="NORSEF"] <- '0'
HNORSE$PopSex[HNORSE$PopSex=="NORSEM"] <- '1'
HNORSE$PopSex <- as.numeric(HNORSE$PopSex)
HNORSE$PopSex
#Resultant column looks like this
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1
[41] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
[81] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I want to use Stepplr from caret package
a <- step.plr(HNORSE[,c(6:76)], HNORSE$PopSex, lambda = 1e-4, cp="bic", max.terms = 1, trace = TRUE, type = "forward")
#Where HNORSE[,c(6:76)] --> features
#HNORSE$PopSex ---> Binary response
#lambda ----> Default value
#max.terms ---> I tried more than 1 value for max.terms, but then R goes into infinite loop of 'Convergence Error'.
#That's why using max.terms=1
Then I ran summary command on "a"
summary(a)
Call: plr(x = ix0, y = y, weights = weights, offset.subset = offset.subset,
offset.coefficients = offset.coefficients, lambda = lambda,
cp = cp)
Coefficients:Estimate Std.Error z value Pr(>|z|)
Intercept -71.93470 13.3521 -5.388 0
ZYB 0.55594 0.1033 5.382 0
Null deviance: 152.49 on 109 degrees of freedom
Residual deviance: 57.29 on 108 degrees of freedom
Score: deviance + 4.7 * df = 66.69
I used step.plr so, I should then use predict.stepplr right? and not predict.plr?
By this logic I wish to use predict.stepplr. The default function argument example goes like this:
n <- 100
p <- 5
x0 <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
x0 <- cbind(rnorm(n),x0)
y <- sample(c(0,1),n,replace=TRUE)
level <- vector("list",length=6)
for (i in 2:6) level[[i]] <- seq(3)
fit <- step.plr(x0,y,level=level)
x1 <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
x1 <- cbind(rnorm(n),x1)
pred1 <- predict(fit,x0,x1,type="link")
pred2 <- predict(fit,x0,x1,type="response")
pred3 <- predict(fit,x0,x1,type="class")
object: stepplr object
x: matrix of features used for fitting object.
If newx is provided, x must be provided as well.
newx: matrix of features at which the predictions are made.
If newx=NULL, predictions for the training data are returned.
type: If type=link, the linear predictors are returned;
if type=response, the probability estimates are returned; and
if type=class, the class labels are returned. Default is type=link.
...
other options for prediction..
So First of all, I did not do any sampling like shown in here.
I want to predict HNORSE$PopSex which is binary variable.
My dataset which does not include the binary variable column is HNORSE[,c(6:76)].
I want to know what x0 and x1 function arguments should I put in
predict.stepplr()?
If not, HOW do I correctly implement
predict.stepplr?
I want to use overall accuracy to plot(Density(overall_accuracy))
I'm new to JAGS, and I'm trying to run a simple logistic regression. My data file is very simple: the response is binary and the one predictor I'm using has three levels. Like this:
col1: 1 2 2 2 1 1 1 2 1 2 ...
col2: HLL, HLL, LHL, LLL, LHL, HLL ...
Dummy coding
The levels in col2 are HLL, LHL, LLL. I dummy coded it and created a data frame that looks like this:
(intercept) HLL LHL LLL
1 1 0 0 1
2 1 0 0 1
4 1 0 0 1
5 1 0 1 0
6 1 0 1 0
7 1 0 0 1
Data list
My data file (myList), then, looks like this:
List of 5
$ y : num [1:107881] 2 2 2 2 2 2 2 2 2 2 ...
$ N : num 500
$ HLL: num [1:107881] 0 0 0 0 0 0 0 0 0 0 ...
$ LHL: num [1:107881] 0 0 0 1 1 0 0 0 0 1 ...
$ LLL: num [1:107881] 1 1 1 0 0 1 1 1 1 0 ...
I'm using N=500 because the full data frame is huge and I just want to test it.
Model
cat(
"model {
for( i in 1 : N ){
y[i] ~ dbern(mu[i])
mu[i] <- 1/(1+exp(-(a + b*HLL[i] + c*LHL[i] + d*LLL[i])))
}
a ~ dnorm(0, 1.0e-12)
b ~ dnorm(0, 1.0e-12)
c ~ dnorm(0, 1.0e-12)
d ~ dnorm(0, 1.0e-12)
}", file = "model.txt"
)
Running model + error
model = jags.model(file = "model.txt",
data = myList,
n.chains = 3, n.adapt = 500)
Error I get
Error in jags.model(file = "model.txt", data = antPenList, n.chains = 3, :
Error in node y[1]
Node inconsistent with parents
The dbern distribution expects response in {0,1} rather than {1,2} as it seems you have coded it, so you need to subtract 1 from your values of y.
It is a bit strange that you get this error, as dbern does not usually give an error for other response values (it basically makes <0 = 0 and >1 = 1). The error is probably stemming from the fact that the response is fitting all the same value, but if that doesn't fix it then you could try the following:
1) Try increasing the precision of your priors for a/b/c/d slightly - a variance of 10^12 is quite a lot
2) Instead of:
mu[i] <- 1/(1+exp(-(a + b*HLL[i] + c*LHL[i] + d*LLL[i])))
You could write:
logit(mu[i]) <- -(a + b*HLL[i] + c*LHL[i] + d*LLL[i])
This might also help JAGS to recognise this as a GLM and initiate the appropriate samplers - remember to load the glm module.
3) Set some initial values for a/b/c/d that are vaguely consistent with your data (perhaps obtained using a fit with glm() in R)
I solved it with
mu[i] <- 1/(1.000001+exp(-(a + b*HLL[i] + c*LHL[i] + d*LLL[i])))
Consider a data set train:
z a
1 1
0 2
0 1
1 3
0 1
1 2
1 1
0 3
0 1
1 3
with a binary outcome variable z and a categorical predictor a with three levels: 1,2,3.
Now consider a data set test:
z a
1
1
2
1
2
2
1
When I run the following code:
library(randomForest)
set.seed(825)
RFfit1 <- randomForest(z~a, data=train, importance=TRUE, ntree=2000)
RFprediction1 <- predict(RFfit1, test)
I get the following error message:
Error in predict.randomForest(RFfit1, test1) :
Type of predictors in new data do not match that of the training data.
I am assuming this is because the variable a in the test data set does not have three levels. How would I fix this?
You must assign it the same levels as train
test$a <- factor(test$a, levels=levels(train$a))
I would like to run a logistic regression with specific group (range 0f values) of a categorial variable.I did the following steps:
1. I cut the variable to groups:
cut_Var3 <- cut(dat$Var3,breaks=c(0,3,6,9))
the outcome of table(cut_Var3) gave me this output (cut_Var3 was turned into a factor):
# (0,3] (3,6] (6,9]
# 5 4 4
I wanted to do a logistic regression with other variable but in separate for the level of (3,6) only.
So I'll be able to run the regression on the 4 observations of the second group.
2. I tried to write this line of code (and also other variations):
ff <- glm( TargetVar ~ relevel(cut_Var3,3:6), data = dat)
but with no luck.
What should I do in order to run it properly?
attached is an example data set:
dat <- read.table(text = " TargetVar Var1 Var2 Var3
0 0 0 7
0 0 1 1
0 1 0 3
0 1 1 7
1 0 0 5
1 0 1 1
1 1 0 0
1 1 1 6
0 0 0 8
0 0 1 5
1 1 1 4
0 0 1 2
1 0 0 9
1 1 1 2 ", header = TRUE)
For relevel you need to specify the level label exactly as it appear in the factor:
glm( TargetVar ~ relevel(cut_Var3,"(3,6]"), data = dat)
Call: glm(formula = TargetVar ~ relevel(cut_Var3, "(3,6]"), data = dat)
Coefficients:
(Intercept) relevel(cut_Var3, "(3,6]")(0,3]
0.75 -0.35
relevel(cut_Var3, "(3,6]")(6,9]
-0.50
Degrees of Freedom: 12 Total (i.e. Null); 10 Residual
(1 observation deleted due to missingness)
Null Deviance: 3.231
Residual Deviance: 2.7 AIC: 24.46
I would like to run the dependent variable of a logistic regression (in my data set it's : dat$admit) with all available variables, each regression with its own Independent variable vs dependent variable.
The outcome that I wanted to get is a list of each regression summary. Using the data set submitted below there should be 3 regressions.
Here is a sample data set (where admit is the logistic regression dependent variable) :
dat <- read.table(text = "
+ female apcalc admit num
+ 0 0 0 7
+ 0 0 1 1
+ 0 1 0 3
+ 0 1 1 7
+ 1 0 0 5
+ 1 0 1 1
+ 1 1 0 0
+ 1 1 1 6",
+ header = TRUE)
I got an example for simple linear regression but When i tried to change the function from lm to glm I got "list()" as a result.
Here is the original code - for the iris dataset where "Sepal.Length" is the dependent variable :
sapply(names(iris)[-1],
function(x) lm.fit(cbind(1, iris[,x]), iris[,"Sepal.Length"])$coef)
How can I create the right function for a logistic regression?
dat <- read.table(text = "
female apcalc admit num
0 0 0 7
0 0 1 1
0 1 0 3
0 1 1 7
1 0 0 5
1 0 1 1
1 1 0 0
1 1 1 6",
header = TRUE)
This is perhaps a little too condensed, but it does the job.
Of course, the sample data set is too small to get any sensible
answers ...
t(sapply(setdiff(names(dat),"admit"),
function(x) coef(glm(reformulate(x,response="admit"),
data=dat,family=binomial))))