Using the setup:
id <- c(1,1,2,2,3,4)
t <- c(1,2,1,2,1,1)
x <- c(1,2,2,1,2,1)
y <- c(1,0,0,0,1,0)
df <- data.frame(id, t, x, y)
Preparing fitted values from an OLS regression as the starting values:
tstart <- lm(y ~ x, data = df)
Running probit:
tfit <- pglm(y ~ x, data = df, index = c("id","t"),
model = "within", family = binomial('probit'), start = tstart$fitted.values)
Returns the error
Error in lnl.binomial(param = start, y = y, X = X, id = id, model = model, :
object 'Li' not found
This error seems very uninformative to me.
I am not even looking for some object 'Li' in any of the calls, and have no idea what this object should be.
The traceback makes it seem to occur in the function:
9: lnl.binomial(param = start, y = y, X = X, id = id, model = model,
link = link, rn = rn) at <text>#1
But trying to look at the code of the function for when the error occurs reveals that there is not even a function
lnl.binomial()
Where did I go wrong about this?
Related
I am using R-INLA to run the following model (Treatment, Animal.1 and Animal.2 are factors and Encounter.Length is continuous):
formula <- Encounter.Length ~ Treatment +f(Animal.1, model = "iid", n = n.animal) +
f(Animal.2, copy = "Animal.1")
m.1 <- inla(formula, data = inla.dat)
However, after running this code I get the following error message:
Error in inla(formula, data = inla.dat) :
In f(Animal.1): 'covariate' must match 'values', and both must either be 'numeric', or 'factor'/'character'.
I am new to using INLA and want to know what this error message means and how to fix it.
Answer (from r-inla.help): The levels of B are not a subset of A (which is used to define the model, for which B copies). So you must define the models on the union on the levels.
For example:
n <- 3
A <- as.factor(letters[1:n])
B <- as.factor(letters[1+1:n])
y <- 1:n
This does not work
inla(y ~ -1 + f(A) + f(B, copy = "A"), data = data.frame(A, B))
But this does
values <- as.factor(unique(c(levels(A), levels(B))))
inla(y ~ -1 + f(A, values = values) + f(B, copy = "A"),
data = list(A = A, B = B, values = values))
I'm trying to specify a cluster variable after plm using vcovCR() in clubSandwich package for my simulated data (which I use for power simulation), but I get the following error message:
"Error in [.data.frame(eval(mf$data, envir), , index_names) : undefined columns selected"
I'm not sure if this is specific to vcovCR() or something general about R, but could anyone tell me what's wrong with my code? (I saw a related post here How to cluster standard errors of plm at different level rather than id or time?, but it didn't solve my problem).
My code:
N <- 100;id <- 1:N;id <- c(id,id);gid <- 1:(N/2);
gid <- c(gid,gid,gid,gid);T <- rep(0,N);T = c(T,T+1)
a <- qnorm(runif(N),mean=0,sd=0.005)
gp <- qnorm(runif(N/2),mean=0,sd=0.0005)
u <- qnorm(runif(N*2),mean=0,sd=0.05)
a <- c(a,a);gp = c(gp,gp,gp,gp)
Ylatent <- -0.05*T + a + u
Data <- data.frame(
Y = ifelse(Ylatent > 0, 1, 0),
id = id,gid = gid,T = T
)
library(clubSandwich)
library(plm)
fe.fit <- plm(formula = Y ~ T, data = Data, model = "within", index = "id",effect = "individual", singular.ok = FALSE)
vcovCR(fe.fit,cluster=Data$id,type = "CR2") # doesn't work, but I can run this by not specifying cluster as in the next line
vcovCR(fe.fit,type = "CR2")
vcovCR(fe.fit,cluster=Data$gid,type = "CR2") # I ultimately want to run this
Make your data a pdata.frame first. This is safer, especially if you want to have the time index created automatically (seems to be the case looking at your code).
Continuing what you have:
pData <- pdata.frame(Data, index = "id") # time index is created automatically
fe.fit2 <- plm(formula = Y ~ T, data = pData, model = "within", effect = "individual")
vcovCR(fe.fit2, cluster=Data$id,type = "CR2")
vcovCR(fe.fit2, type = "CR2")
vcovCR(fe.fit2,cluster=Data$gid,type = "CR2")
Your example does not work due to a bug in clubSandwich's data extraction function get_index_order (from version 0.3.3) for plm objects. It assumes both index variables are in the original data but this is not the case in your example where the time index is created automatically by only specifying the individual dimension by the index argument.
Using data from the fivethirtyeight package...
library(fivethirtyeight)
grads <- college_recent_grads
Created a subset of the grads data to include desired variables
data <- grads[, c("men", "major_category", "employed",
"employed_fulltime_yearround", "p25th",
"p75th", "total")]
Then, I split the data subset up by major category and omitted the one NA value in the data
majorcats <- split(data, data$major_category)
names(majorcats)
majorcats <- majorcats %>% na.omit()
And tried to run a regression model in a function called facts, where the user could specify x, y, and z, z being a major category (hence why I split up the data subset by major_category)
facts <- function(x, y, z){
category <- majorcats[["z"]]
summary(lm(y ~ x, data = category))
}
Unfortunately, when I try to input variables into facts (that are part of the majorcats data set, such as
facts(men, p25th, Arts)
I get the error below:
Error in model.frame.default(formula = y ~ x, data = category,
drop.unused.levels = TRUE) :
invalid type (NULL) for variable 'y'
Called from: model.frame.default(formula = y ~ x, data = category,
drop.unused.levels = TRUE)
Browse[1]>
Can someone please explain what this error means, and how I might be able to fix it?
Simply pass the parameters as string literals and create a formula from string:
facts <- function(x, y, z){
category <- majorcats[[z]]
model <- as.formula(paste(y, "~", x))
# ALTERNATIVE: model <- reformulate(x, response=y)
summary(lm(model, data = category))
}
facts("men", "p25th", "Arts")
I'm having some problems with the predict function when using bayesglm. I've read some posts that say this problem may arise when the out of sample data has more levels than the in sample data, but I'm using the same data for the fit and predict functions. Predict works fine with regular glm, but not with bayesglm. Example:
control <- y ~ x1 + x2
# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")
# this gives an error:
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
Error in X[, piv, drop = FALSE] : subscript out of bounds
# Edit... I just discovered this works.
# Should I be concerned about using these results?
# Not sure why is fails when I specify the dataset
predicted3 <- predict(bayesglmObject, type = "response")
Can't figure out how to predict with a bayesglm object. Any ideas? Thanks!
One of the reasons could be to do with the default setting for the parameter "drop.unused.levels" in the bayesglm command. By default, this parameter is set to TRUE. So if there are unused levels, it gets dropped during model building. However, the predict function still uses the original data with the unused levels present in the factor variable. This causes differences in level between the data used for model building and the one used for prediction (even it is the same data fame -in your case, myData). I have given an example below:
n <- 100
x1 <- rnorm (n)
x2 <- as.factor(sample(c(1,2,3),n,replace = TRUE))
# Replacing 3 with 2 makes the level = 3 as unused
x2[x2==3] <- 2
y <- as.factor(sample(c(1,2),n,replace = TRUE))
myData <- data.frame(x1 = x1, x2 = x2, y = y)
control <- y ~ x1 + x2
# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")
# this gives an error - this uses default drop.unused.levels = TRUE
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
Error in X[, piv, drop = FALSE] : subscript out of bounds
# this works fine - value of drop.unused.levels is set to FALSE
bayesglmObject <- bayesglm(control, myData, family = binomial(),drop.unused.levels = FALSE)
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
I think a better way would be to use droplevels to drop the unused levels from the data frame beforehand and use it for both model building and prediction.
I have written a function to run phylogenetic generalized least squares, and everything looks like it should work fine, but for some reason, a specific variable which is defined in the script (W) keeps coming up as undefined. I have stared at this code for hours and cannot figure out where the problem is.
Any ideas?
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(0, 4), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(nlme)
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)
returns "Error in eval(expr, envir, enclos) : object 'W' not found"
even though W is specifically defined!
Thanks!
The error's occuring in the gls calls in myou and corMatrins2: you have to pass in W as a column in dat because gls is looking for it there (when you put weights = ~W as a formula like that it looks for dat$W and can't find it).
Just change data=dat to data=cbind(dat,W=W) in both functions.
The example is not reproducible for me, as lowerB and upperB are not defined, however, perhaps the following will work for you, cbinding dat with W:
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(lowerB, upperB), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)