Logistic regression outcome variable predictions in r - r

I'm using logistic regression to predict a binary outcome variable (Group, 0/1).
So I've noticed something: I have two variable representing the same outcome, one is coded simply as "0" or "1".
> df$Group
>[1] 0 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1
> 0 0 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
> [59] 1 1 1 1 1 1 0 1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 1 1 0 1 0 1 0 1 1 1 1
> 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0
>[117] 0 0 0 1 1 1 1
> 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1 0
> 0 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1
>[175] 1 0 1
>Levels: 0 1
> is.factor(df$Group)
> [1] TRUE
Same story for the other one which represents the same thing, but has "names" labels:
> df$Group
>[1] CON CI CON CI CI CON CI
> CI CON CI CI CI CON CI
>[15] CI ecc.. ecc..
> Levels: CI CON
> is.factor(df$Group2)
> [1] TRUE
> contrasts(df$Group2)
> CI 0
> CON 1
In which 0 in the first variable =CON, whereas 1=CI. I created that first numerical variable because I wanted CI to be my "1" group, and CON the 0 reference group, but when I was transforming from the dataset, each time I tried to do "as.factor" what happened was CI=level 1, CON = level 2.
I thought they were the same thing, but when I tried to plot the odds ratio using sjPlot package, and just checked to be sure, I noticed that the OR were quite different, although by inspecting the coefficients of summary(glm model), everything seemed the same(apart from -or + of estimates, which makes sense as the two groups are coded differently). Specifically, when using the numerical variable the plotted OR are definitely bigger, whereas when using the "name" variable, the OR are smaller.
Am I missing something in the understanding of r (I'm self-thought) or in computation of logistic regression? Which one of the variables should I use in logistic regression? And how could I change the fact that in the "name" variables r uses "CI" as 0 reference group instead of CON? Thank you.

By default, R uses alphabetical order for levels of factor. You can set your own order simply by
df$Group <- factor(df$Group, levels=c('CON','CI'))
Then CON would be used as reference level in logistic regression and you should get the same results as with 0/1 coding.

Related

MICE not imputing all variables with missing values

I'm struggling to get mice to impute all the variables with missing values in my dataset. It's working perfectly for 4 of the variables, but not 3 others (and I'm getting the 3 logged events, which I suspect correspond to the 3 in question: GCSPupils, Hypoxia, Hypotension), but I can't figure out the issue. There seems to be variability in those variables (not constants), so mice should work. I want to do single imputation of 7 variables (the other variables have complete data).
# We run the mice code with 0 iterations
imp <- mice(TXAIMPACT_final, maxit = 0)
# Extract predictor Matrix and methods of imputation
predM <- imp$predictorMatrix
meth <- imp$method
#Setting values of variables I'd like to leave out to 0 in the predictor matrix
predM[,c("subjectId")] <- 0
# Specify a separate imputation model for variables of interest
# Dichotomous variable
log <- c("Hypotension", "Hypoxia")
# Unordered categorical variable
poly2 <- c("GCSPupils", "GCSMotor")
# Turn their methods matrix into the specified imputation models
meth[log] <- "logreg"
meth[poly2] <- "polyreg"
Here, I check to make sure "meth" is correct, and it is:
meth
subjectId Age GCS GCSMotor GCSPupils Glucose Hemoglobin
"" "" "" "polyreg" "polyreg" "pmm" "pmm"
Hypotension Hypoxia MarshallCT SAH EDH GOS GFAP
"logreg" "logreg" "pmm" "" "" "" ""
The methods are all correct as I specified. I do notice something funny about the Predictor Matrix, which is that the 3 variables not imputing only show "0" for their columns:
predM
subjectId Age GCS GCSMotor GCSPupils Glucose Hemoglobin Hypotension Hypoxia MarshallCT SAH EDH GOS GFAP
subjectId 0 1 1 1 0 1 1 0 0 1 1 1 1 1
Age 0 0 1 1 0 1 1 0 0 1 1 1 1 1
GCS 0 1 0 1 0 1 1 0 0 1 1 1 1 1
GCSMotor 0 1 1 0 0 1 1 0 0 1 1 1 1 1
GCSPupils 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Glucose 0 1 1 1 0 0 1 0 0 1 1 1 1 1
Hemoglobin 0 1 1 1 0 1 0 0 0 1 1 1 1 1
Hypotension 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Hypoxia 0 0 0 0 0 0 0 0 0 0 0 0 0 0
MarshallCT 0 1 1 1 0 1 1 0 0 0 1 1 1 1
SAH 0 1 1 1 0 1 1 0 0 1 0 1 1 1
EDH 0 1 1 1 0 1 1 0 0 1 1 0 1 1
GOS 0 1 1 1 0 1 1 0 0 1 1 1 0 1
GFAP 0 1 1 1 0 1 1 0 0 1 1 1 1 0
I think this is the problem, but I'm not sure how to solve. Finally, here is my single imputation:
imp2 <- complete(mice(TXAIMPACT_final, maxit = 1,
+ predictorMatrix = predM,
+ method = meth, print = TRUE))
iter imp variable
1 1 GCSMotor Glucose Hemoglobin MarshallCT
1 2 GCSMotor Glucose Hemoglobin MarshallCT
1 3 GCSMotor Glucose Hemoglobin MarshallCT
1 4 GCSMotor Glucose Hemoglobin MarshallCT
1 5 GCSMotor Glucose Hemoglobin MarshallCT
Warning: Number of logged events: 3
Thanks in advance!
Figured it out--posting here in case someone else has this issue. My variables that were not imputing were stored as character classes, which blocked imputation. As soon as I switched them to numeric, my issues disappeared.

I need help putting values from one vector into another in R

I have two vectors in R
Vector 1
0 0 0 0 0 0 0 0 0 0
Vector 2
1 1 3 1 1 1 1 1
I need to put the values from vector 2 into vector 1 but into specific positions so that vector 1 becomes
1 1 3 0 0 1 1 1 1 1
I need to do this in one line of code. I tried doing:
vector1[1:3,6:10] = vector2[1:3,4:8]
but I am getting the error "incorrect number of dimensions".
Is it possible to do this?
vector1[c(1:3,6:10)] = vector2[c(1:3,4:8)]
> vector1
[1] 1 1 3 0 0 1 1 1 1 1
We may use negative indexing
vector1[-(4:5)] <- vector2
vector1
[1] 1 1 3 0 0 1 1 1 1 1

How can I create a random dummy variable of 2000 observations in R and Stata?

I want to create a random dummy (1 and 0) variable in R or Stata, but how can I make that, for example, 70% of observations be 1 and the rest 0. Thanks
If you wanted exactly 70% of 1s (or any other percentage), but a random ordering of the elements, you can use this function.
random_binary <- function(n, p){
# p is the proportion of 1s
x <- c(rep(1, times=n * p), rep(0, times=n * (1 - p)))
x[sample(length(x))] # or sample(x)
}
random_binary(10, 0.7)
#[1] 1 0 1 1 0 0 1 1 1 1
The times argument of rep can be non-integer, as mentioned in the documentation.
? rep
times
A double vector is accepted, other inputs being coerced to an integer
or double vector.
But note that you may not get exactly the percentage desired (but as close as possible).
An alternative is to use rbinom, since we're effectively sampling from the binomial distribution.
rbinom(10, size=1, p=0.7)
# [1] 0 0 0 0 1 1 1 0 1 0
This is similar to sample with the prob argument and, as shown above, does not guarantee to return exactly 70% of 1s.
Here's an approach with sample from base R:
sample(c(1,0), size = 2000, prob = c(0.7,0.3), replace = TRUE)
# [1] 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1
#[58] 1 1 1 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1
As #Ben Bolker points out in the comments, it would be unusual for exactly 1400 to be 1.
This approach will result in exactly 1400 1s:
sample(rep(c(1,0),c(1400,600)), 2000)
In Stata for exactly 70% 1s and 30% 0s
set obs 2000
set seed 1606
gen wanted = cond(_n <= 70, 1, 0)
gen random = runiform()
sort random
For approximately 70% 1s and 30% 0s
gen better = runiform() < 0.7

How do you sum different columns of binary variables based on a desired set of variables/column?

I used the code below for a total of 25 variables and it worked.It shows up as either 1 or 0:
jb$finances <- ifelse(grepl("Finances", jb$content.roll),1,0)
I want to be able to add the number of "1" s in each row across the multiple of selected column/variables I just made (using the code above) into another column called "sum.content". I used the code below:
jb <- jb %>%
mutate(sum.content=sum(jb$Finances,jb$Exercise,jb$Volunteer,jb$Relationships,jb$Laugh,jb$Gratitude,jb$Regrets,jb$Meditate,jb$Clutter))
I didn't get an error using the code above, but I did not get the outcome I wanted.
The result of this was 14 for all my row.I was expecting something <9 since I only selected for 9 variables.I don't want to delete the other variables like V1 and V2, I just want to focus on summing some variables.
This is what I got using the code:
V1 V2... Finances Exercise Volunteer Relationships Laugh sum.content
1 1 1 1 1 0 14
2 0 1 0 0 1 14
2 0 0 0 0 1 14
This is What I want:
V1 V2... Finances Exercise Volunteer Relationships Laugh sum.content
1 1 1 1 1 0 4
2 0 1 0 0 1 1
2 0 0 0 0 1 1
I want R to add the number of 1's in each row(within the columns I want to select). How would I go about incorporating the adding of the 1's in code(from a set of variable/column)?
Here is an answer that uses dplyr to sum across rows of variables starting with the letter V. We'll simulate some data, convert to binary, and then sum the rows.
data <- matrix(rnorm(100,100,30),nrow = 10)
# recode to binary
data <- apply(data,2,function(x){x <- ifelse(x > 100,1,0)})
# change some of the column names to illustrate impact of
# select() within mutate()
colnames(data) <- c(paste0("V",1:5),paste0("X",1:5))
as.data.frame(data) %>%
mutate(total = select(.,starts_with("V")) %>% rowSums)
...and the output, where the sums should equal the sum of V1 - V5 but not
X1 - X5:
V1 V2 V3 V4 V5 X1 X2 X3 X4 X5 total
1 1 0 0 0 1 0 0 0 1 0 2
2 1 0 0 1 0 0 0 1 1 0 2
3 1 1 1 0 1 0 0 0 1 0 4
4 0 0 1 1 0 1 0 0 1 0 2
5 0 0 1 0 1 0 1 1 1 0 2
6 0 1 1 0 1 0 0 1 1 1 3
7 1 0 1 1 0 0 0 0 0 1 3
8 1 0 0 1 1 1 0 1 1 1 3
9 1 1 0 0 1 0 1 1 0 0 3
10 0 1 1 0 1 1 0 0 1 0 3
>

Exporting a list

I'm trying to export a list of 0's and 1's in R using the following code:
write(export, file="export.txt", ncol=1)
However, in the file "export.txt," there are 1's and 2's instead of 0's and 1's. How do I get the exported file to have 0's and 1's?
R List: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1
This is what shows up in the file: 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 2 1 2 2
I suspect that export is a factor variable. write is a wrapper for cat and cat doesn't seem to gracefully handle factors:
x <- factor(0:1)
cat(x)
## 1 2
You can coerce to character to get the proper output:
cat(as.character(x), file="export.txt")
## 0 1

Resources