I've created a logistic regression with the glm function
mynewlogit <- glm(is_bad ~ ulmp_s_ratio + plmp_mac_all_60d + plmp_est_mac_all_90d + plmp_c_mac_all_90d + lmp_s_ratio + plmp_c_mac_hrsk_60d + ulmp_c_ycount + pp_usr_lmp_count + l2pp_pp_age_min + lmp_c_ratio + lmp_age_max + lmp_age_avg
, data = rajsub, family = "binomial")
and I've got this result:
Coefficients:
(Intercept) ulmp_s_ratio plmp_mac_all_60d plmp_est_mac_all_90d
-1.6226917 1.8704011 0.1037387 0.1583566
plmp_c_mac_all_90d lmp_s_ratio plmp_c_mac_hrsk_60d ulmp_c_ycount
-0.1333490 1.0456631 1.1447296 1.6073142
pp_usr_lmp_count l2pp_pp_age_min lmp_c_ratio lmp_age_max
0.0404034 0.0000457 -0.1052236 0.0002902
lmp_age_avg
-0.0010493
How to present the outcome as an equation format, as:
x = -1.6226917 + 1.8704011*ulmp_s_ratio ...
Related
In R using GLM to include all variables you can simply use a . as shown How to succinctly write a formula with many variables from a data frame?
for example:
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
however I am struggling to do this with svydesign. I have many exploratory variables and an ID and weight variable, so first I create my survey design:
des <-svydesign(ids=~id, weights=~wt, data = df)
Then I try creating my binomial model using weights:
binom <- svyglm(y~.,design = des, family="binomial")
But I get the error:
Error in svyglm.survey.design(y ~ ., design = des, family = "binomial") :
all variables must be in design = argument
What am I doing wrong?
You typically wouldn't want to do this, because "all the variables" would include design metadata such as weights, cluster indicators, stratum indicators, etc
You can use col.names to extract all the variable names from a design object and then reformulate, probably after subsetting the names, eg with the api example in the package
> all_the_names <- colnames(dclus1)
> all_the_actual_variables <- all_the_names[c(2, 11:37)]
> reformulate(all_the_actual_variables,"y")
y ~ stype + pcttest + api00 + api99 + target + growth + sch.wide +
comp.imp + both + awards + meals + ell + yr.rnd + mobility +
acs.k3 + acs.46 + acs.core + pct.resp + not.hsg + hsg + some.col +
col.grad + grad.sch + avg.ed + full + emer + enroll + api.stu
I have created a logistic regression model and a corresponding ROC using pROC. I got the threshold for the "best" value that maximizes sensitivity and specificity. The predictor is a score that goes from 4 to 13 points. The predicted variable is survival. I need to know which value in my score (from 4 to 13) is represented by the threshold value (e.g. 0.043). Will appreaciate your help.
Code looks like this>
#MULTIVARIATE ANALYSIS
summary(glm((VIVO_AL_ALTA==0)~ CALCULADORA_CALL_SCORE +
SOPORTE_VENT_AL_INGRESO + ETE_DURANTE_HOSP +
SOBREINFECC_BACT_DURANTE_HOSP + COP_DURANTE_HOSP + TOCILIZUMAB +
CORTICOIDES_HOSP, family = binomial, data = work_data))
exp(coef(glm((VIVO_AL_ALTA==0)~ CALCULADORA_CALL_SCORE +
SOPORTE_VENT_AL_INGRESO + ETE_DURANTE_HOSP +
SOBREINFECC_BACT_DURANTE_HOSP + COP_DURANTE_HOSP + TOCILIZUMAB +
CORTICOIDES_HOSP , family = binomial, data = work_data)))
exp(confint.default(glm((VIVO_AL_ALTA==0) ~
CALCULADORA_CALL_SCORE + SOPORTE_VENT_AL_INGRESO +
ETE_DURANTE_HOSP + SOBREINFECC_BACT_DURANTE_HOSP +
COP_DURANTE_HOSP + TOCILIZUMAB + CORTICOIDES_HOSP , family=
binomial, data= work_data), level = .95))
mod_vivo_alta_multi<-glm((VIVO_AL_ALTA==0)~
CALCULADORA_CALL_SCORE + SOPORTE_VENT_AL_INGRESO +
ETE_DURANTE_HOSP + SOBREINFECC_BACT_DURANTE_HOSP +
COP_DURANTE_HOSP + TOCILIZUMAB + CORTICOIDES_HOSP, family =
binomial, data = work_data, na.action = "na.exclude")
#ROC Curves
library(pROC)
#ROC VIVO ALTA MULTI
work_data$pred_vivo_alta_multi<-predict(mod_vivo_alta_multi, type
= "response", na.action = "na.omit")
pROC_obj_pred_vivo_alta_multi <-
roc((work_data$VIVO_AL_ALTA==0),work_data$pred_vivo_alta_multi,
smoothed = TRUE, direction="<",
# arguments for ci
ci=TRUE, ci.alpha=0.95,
stratified=FALSE,
# arguments for plot
plot=TRUE, auc.polygon=F,
max.auc.polygon=TRUE, grid=TRUE,
print.thres=T,
print.auc=TRUE, show.thres=TRUE)
coords(pROC_obj_pred_vivo_alta_multi,x= "best",
input="threshold", ret=c("threshold", "specificity",
"sensitivity", "npv", "ppv","youden"), as.list=FALSE, drop=TRUE,
best.method=c("youden"), best.weights=c(1, 0.5), transpose =
FALSE, as.matrix=FALSE)
Sorry for some of the variables are in Spanish. Basically, these are my variables for the model: CALCULADORA_CALL_SCORE + SOPORTE_VENT_AL_INGRESO +
ETE_DURANTE_HOSP + SOBREINFECC_BACT_DURANTE_HOSP +
COP_DURANTE_HOSP + TOCILIZUMAB + CORTICOIDES_HOSP
And my predicted variable is VIVO_AL_ALTA
You can use coords(my_roc, "best").
Here's a demonstration using a built in example from pROC
library(pROC)
ROC <- roc(aSAH$outcome, aSAH$s100b, levels=c("Good", "Poor"))
#> Setting direction: controls < cases
coords(ROC, "best")
#> threshold specificity sensitivity
#> 1 0.205 0.8055556 0.6341463
Created on 2022-05-20 by the reprex package (v2.0.1)
I want reduce the expression in r code
model1 <- pglm::pglm(formula = lfp ~ lfp_1+lfp1+ kids + *kids2 + kids3 + kids4 + kids5+ lhinc + lhinc2 + lhinc3 +lhinc4 + lhinc5 +educ+ black + age + agesq + per2+ per3 + per4+ per5,
family = binomial("probit"),
data = lfp1,
model = "random")
on stata will put kids2 - kids5 and list the variables kids from 2 to 5 in the regression.
Same to lhinc2-lhinc5 and to per2 - per5
Try this one:
model1 <- pglm::pglm(formula = lfp ~.,
family = binomial("probit"),
data = lfp1,
model = "random")
I'm conducting lme analysis using on my dataset with the following code
M1 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, random = ~ 1 + visit |id, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
and I get the following error message:
Error in model.frame.default(formula = ~visit + sx + agevis + c_bmi +
: attempt to apply non-function
I am not sure what I am doing wrong or how to get the model to run. I really appreciate an answer. Thank you.
I am trying to run a linear mixed effect model with VT as my dependent variable, visit as my time variable, with a 1st order autoregressive correlation, ML estimator on data with some missing observations.
I have tried changing the code in the following ways but got the same error message
library(nlme)
?lme
fm2 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, random = ~ 1|id, corAR1(),method = "ML", na.action = na.pass(Cleaned_data4t300919))
fm2 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + sfnMH + ethn, data = Cleaned_data4t300919, random = ~ 1 + visit |cenid, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
fm2 <- lme(VT ~ visit + sx + agevis , data = Cleaned_data4t300919, random = ~ 1 + visit |id, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
fm2 <- lme(VT~visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, na.action = na.exclude(Cleaned_data4t300919))
fm2 <- lme(formula= sfnVT ~ visit + sx + agevis , data = Cleaned_data4t300919, random = ~ 1 + visit |cenid, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
I will like to obtain the estimates for the code and plot estimates using ggplot.
na.action = na.omit(Cleaned_data4t300919)
and similar attempts are the problem I think.
From ?lme:
na.action: a function that indicates what should happen when the data
contain 'NA's
You are providing data, not a function, since na.omit(dataset) returns a data.frame with NA containing rows removed, rather than something that can be applied to the data= specified. Just:
na.action=na.omit
or similar na.* functions will be sufficient.
A way to identify these kinds of issues for sure is to use ?debug - debug(lme) then step through the function line-by-line to see exactly what the error is in response to.
I need to make a prediction of a soil variable as a function of auxiliary variables in the georob package.
My solo dataset has 200 observations and my auxiliary variables set has 19940 data, however in the code, I can't enter the coordinates of the auxiliary variables as prediction points.
dat= read.csv("malhas amostrais/solo_200.csv", sep = ",")
covar = read.csv("../dados/csv/variaveis_auxiliares.csv", sep = ";")
ku_georob_cpeso <- georob(argila ~ CV + CH + dist_bebedouros + Eca_0.5m + Eca_1m + elevacao + IH_0.5m + sd_ndvi_01 + sd_ndvi_02 + twi + S_P_T + sd_b4 +sd_b5 + sd_b6+ sd_b7,
data= dat,
locations= ~ x + y,
variogram.model="RMexp",
param=c(variance=200, nugget=600, scale=150),
verbose = 3,
psi.func = "huber")
ku_georob_cpeso <- georob(argila ~ CV + CH + dist_bebedouros + Eca_0.5m + Eca_1m + elevacao + IH_0.5m + sd_ndvi_01 + sd_ndvi_02 + twi + S_P_T + sd_b4 +sd_b5 + sd_b6+ sd_b7,
data= dat1,
subset = cova,
locations= ~ x + y,
variogram.model="RMexp",
param=c(variance=200, nugget=600, scale=150),+ verbose = 3,
psi.func = "huber")
I receive the error:
Error in xj[i] : invalid subscript type 'list'