predict.MCMCglmm errors in R - r

I'm getting back into an old dataset, where I had multinomial mcmcglmm models. Understanding the posteriors for these models requires converting the formulas into predictions. I had been using' predict() in R in 2015-2016, and it was working in the following format, at that time:
pred <- predict.MCMCglmm(model, marginal = model$Random$formula, interval="prediction")
I'm now finding that the same line of code throws an error. To be specific: Error in if (!grepl("hu|zi|za|multinomial", object$Residual$family[i])) { :
argument is of length zero
I found this old communication indicating that it is because the family value is null, but when I use the solution model$Residual$family<-rep("gaussian", nrow(model$X)), I still get an error. This time it's Error in if (nat == 0) { : argument is of length zero
I am not understanding anything useful out of the traceback, but here it is just in case:
> 4. simulate.MCMCglmm(object = object, nsim = nrow(object$Sol), newdata = newdata, marginal = marginal, type = type, it = it, posterior = posterior, verbose = verbose)
> 3. simulate(object = object, nsim = nrow(object$Sol), newdata = newdata, marginal = marginal, type = type, it = it, posterior =
> posterior, verbose = verbose)
> 2. t(simulate(object = object, nsim = nrow(object$Sol), newdata = newdata, marginal = marginal, type = type, it = it, posterior =
> posterior, verbose = verbose))
> 1. predict.MCMCglmm(m1.Ed.all, marginal = m1.Ed.all$Random$formula, interval = "prediction")
I'm clearly missing something, but reading back through the documentation and googling has gotten me no further. I'm not even sure which argument is being referred to as length zero, here, because the family value is no longer blank. Although the fact that the error message changes suggests it does have to do with that value.
As an aside: if there's now a better, smoother way to plot predictions from a multinomial mcmcglmm I would be overjoyed to hear about it.

Related

Error while using the weights option in nlme in r

Sorry this is crossposting from https://stats.stackexchange.com/questions/593717/nlme-regression-with-weights-syntax-in-r, but I thought it might be more appropriate to post it here.
I am trying to fit a power curve to model some observations in an nlme. However, I know some observations to be less reliable than others (reliability of each OBSID reflected in the WEIV in the dummy data), relatively independent of variance, and I quantified this beforehand and wish to include it as weights in my model. Moreover, I know a part of my variance is correlated with my independent variable so I cannot use directly the variance as weights.
This is my model:
coeffs_start = lm(log(DEPV)~log(INDV), filter(testdummy10,DEPV!=0))$coefficients
nlme_fit <- nlme(DEPV ~ a*INDV^b,
data = testdummy10,
fixed=a+b~ 1,
random = a~ 1,
groups = ~ PARTID,
start = c(a=exp(coeffs_start[1]), b=coeffs_start[2]),
verbose = F,
method="REML",
weights=varFixed(~WEIV))
This is some sample dummy data (I know it is not a great fit but it's fake data anyway) : https://github.com/FlorianLeprevost/dummydata/blob/main/testdummy10.csv
This runs well without the "weights" argument, but when I add it I get this error and I am not sure why because I believe it is the correct syntax:
Error in recalc.varFunc(object[[i]], conLin) :
dims [product 52] do not match the length of object [220]
In addition: Warning message:
In conLin$Xy * varWeights(object) :
longer object length is not a multiple of shorter object length
Thanks in advance!
This looks like a very long-standing bug in nlme. I have a patched version on Github, which you can install via remotes::install_github() as below ...
remotes::install_github("bbolker/nlme")
testdummy10 <- read.csv("testdummy10.csv") |> subset(DEPV>0 & INDV>0)
coeffs_start <- coef(lm(log(DEPV)~log(INDV), testdummy10))
library(nlme)
nlme_fit <- nlme(DEPV ~ a*INDV^b,
data = testdummy10,
fixed=a+b~ 1,
random = a~ 1,
groups = ~ PARTID,
start = c(a=exp(coeffs_start[1]),
b=coeffs_start[2]),
verbose = FALSE,
method="REML",
weights=varFixed(~WEIV))
packageVersion("nlme") ## 3.1.160.9000

Extracting the relative influence from a gbm.fit object

I am trying to extract the relative influence of each variable from a gbm.fit object but it is coming up with the error below:
> summary(boost_cox, plotit = FALSE)
Error in data.frame(var = object$var.names[i], rel.inf = rel.inf[i]) :
row names contain missing values
The boost_cox object itself is fitted as follows:
boost_cox = gbm.fit(x = x,
y = y,
distribution="coxph",
verbose = FALSE,
keep.data = TRUE)
I have to use the gbm.fit function rather than the standard gbm function due to the large number of predictors (26k+)
I have solve this issue now myself.
The relative.influence() function can be used and works for objects created using both gbm() and gbm.fit(). However, it does not provide the plots as in the summary() function.
I hope this helps anyone else looking in the future.

Error with svyglm function in survey package in R: "all variables must be in design=argument"

New to stackoverflow. I'm working on a project with NHIS data, but I cannot get the svyglm function to work even for a simple, unadjusted logistic regression with a binary predictor and binary outcome variable (ultimately I'd like to use multiple categorical predictors, but one step at a time).
El_under_glm<-svyglm(ElUnder~SO2, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)
Error in eval(extras, data, env) :
object '.survey.prob.weights' not found
I changed the variables to 0 and 1 instead:
Under_narm$SO2REG<-ifelse(Under_narm$SO2=="Heterosexual", 0, 1)
Under_narm$ElUnderREG<-ifelse(Under_narm$ElUnder=="No", 0, 1)
But then get a different issue:
El_under_glm<-svyglm(ElUnderREG~SO2REG, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)
Error in svyglm.survey.design(ElUnderREG ~ SO2REG, design = SAMPdesign, :
all variables must be in design= argument
This is the design I'm using to account for the weights -- I'm pretty sure it's correct:
SAMPdesign=svydesign(data=Under_narm, id= ~NHISPID, weight= ~SAMPWEIGHT)
Any and all assistance appreciated! I've got a good grasp of stats but am a slow coder. Let me know if I can provide any other information.
Using some make-believe sample data I was able to get your model to run by setting rescale = TRUE. The documentation states
Rescaling of weights, to improve numerical stability. The default
rescales weights to sum to the sample size. Use FALSE to not rescale
weights.
So, one solution maybe is just to set rescale = TRUE.
library(survey)
# sample data
Under_narm <- data.frame(SO2 = factor(rep(1:2, 1000)),
ElUnder = sample(0:1, 1000, replace = TRUE),
NHISPID = paste0("id", 1:1000),
SAMPWEIGHT = sample(c(0.5, 2), 1000, replace = TRUE))
# with 'rescale' = TRUE
SAMPdesign=svydesign(ids = ~NHISPID,
data=Under_narm,
weights = ~SAMPWEIGHT)
El_under_glm<-svyglm(formula = ElUnder~SO2,
design=SAMPdesign,
family=quasibinomial(), # this family avoids warnings
rescale=TRUE) # Weights rescaled to the sum of the sample size.
summary(El_under_glm, correlation = TRUE) # use correlation with summary()
Otherwise, looking code for this function's method with 'survey:::svyglm.survey.design', it seems like there may be a bug. I could be wrong, but by my read when 'rescale' is FALSE, .survey.prob.weights does not appear to get assigned a value.
if (is.null(g$weights))
g$weights <- quote(.survey.prob.weights)
else g$weights <- bquote(.survey.prob.weights * .(g$weights)) # bug?
g$data <- quote(data)
g[[1]] <- quote(glm)
if (rescale)
data$.survey.prob.weights <- (1/design$prob)/mean(1/design$prob)
There may be a work around if you assign a vector of numeric values to .survey.prob.weights in the global environment. No idea what these values should be, but your error goes away if you do something like the following. (.survey.prob.weights needs to be double the length of the data.)
SAMPdesign=svydesign(ids = ~NHISPID,
data=Under_narm,
weights = ~SAMPWEIGHT)
.survey.prob.weights <- rep(1, 2000)
El_under_glm<-svyglm(formula = ElUnder~SO2,
design=SAMPdesign,
family=quasibinomial(),
rescale=FALSE)
summary(El_under_glm, correlation = TRUE)

Unused argument error when building a Confusion Matrix in R

I am currently trying to run Logistic Regression model on my DF.
While I was creating a new modelframe with the actual and predicted values i get get the following error message.
Error
Error in confusionMatrix(as.factor(log_class), lgtest$Satisfaction, positive = "satisfied") :
unused argument (positive = "satisfied")
This is my model:
#### Logistic regression model
log_model = glm(Satisfaction~., data = lgtrain, family = "binomial")
summary(log_model)
log_preds = predict(log_model, lgtest[,1:22], type = "response")
head(log_preds)
log_class = array(c(99))
for (i in 1:length(log_preds)){
if(log_preds[i]>0.5){
log_class[i]="satisfied"}else{log_class[i]="neutral or dissatisfied"}}
### Creating a new modelframe containing the actual and predicted values.
log_result = data.frame(Actual = lgtest$Satisfaction, Prediction = log_class)
lgtest$Satisfaction = factor(lgtest$Satisfaction, c(1,0),labels=c("satisfied","neutral or dissatisfied"))
lgtest
confusionMatrix(log_class, log_preds, threshold = 0.5) ####this works
mr1 = confusionMatrix(as.factor(log_class),lgtest$Satisfaction, positive = "satisfied") ## this is the line that causes the error
I had same problem. I typed "?confusionMatrix" and take this output:
Help on topic 'confusionMatrix' was found in the following packages:
confusionMatrix
(in package InformationValue in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Create a confusion matrix
(in package caret in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Confusion Matrix
(in package ModelMetrics in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
As we can understand from here, since it is in more than one package, we need to specify which package we want to use.
So I typed code with "caret::confusionMatrix(...)" and it worked!
This is how we can write the code to get rid of argument error when building a confusion matrix in R
caret::confusionMatrix(
data = new_tree_predict$predicted,
reference = new_tree_predict$actual,
positive = "True"
)

Fail to predict woe in R

I used this formula to get woe with
library("woe")
woe.object <- woe(data, Dependent="target", FALSE,
Independent="shop_id", C_Bin=20, Bad=0, Good=1)
Then I want to predict woe for the test data
test.woe <- predict(woe.object, newdata = test, replace = TRUE)
And it gives me an error
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "data.frame"
Any suggestions please?
For prediction, you cannot do it with the package woe. You need to use the package. Take note of the masking of the function woe, see below:
#let's say we woe and then klaR was loaded
library(klaR)
data = data.frame(target=sample(0:1,100,replace=TRUE),
shop_id = sample(1:3,100,replace=TRUE),
another_var = sample(letters[1:3],100,replace=TRUE))
#make sure both dependent and independent are factors
data$target=factor(data$target)
data$shop_id = factor(data$shop_id)
data$another_var = factor(data$another_var)
You need two or more dependent variables:
woemodel <- klaR::woe(target~ shop_id+another_var,
data = data)
If you only provide one, you have an error:
woemodel <- klaR::woe(target~ shop_id,
data = data)
Error in woe.default(x, grouping, weights = weights, ...) : All
factors with unique levels. No woes calculated! In addition: Warning
message: In woe.default(x, grouping, weights = weights, ...) : Only
one single input variable. Variable name in resulting object$woe is
only conserved in formula call.
If you want to predict the dependent variable with only one independent, something like logistic regression will work:
mdl = glm(target ~ shop_id,data=data,family="binomial")
prob = predict(mdl,data,type="response")
predicted_label = ifelse(prob>0.5,levels(data$target)[1],levels(data$target)[0])

Resources