"One-sided" Predictor Variable in Logistic Regression - r

Background Information:
I have 6 subjects; we'll call them A,B,C,D,E, and F.
Suppose they are asked to shoot basketballs from a free-throw line into a basket. A success is 1 and a failure is 0.
They performed as follows:
A - 0 0 1 0 0 1 1 1 0 0 1
B - 0 0 0 0 0 0 0 0 0 0 0
C - 1 0 1 1 0 0 0 0 1 0 0
D - 1 1 1 1 1 1 1 1 1 1 1
E - 1 1 0 0 0 0 0 1 1 0 0
F - 0 1 0 0 1 0 0 1 1 0 0
Question:
Now suppose I wanted to test that all of these subjects had the same probability of making a basket.
I would set up the logistic regression as such: Success being the probability of scoring a basket, and subject being the predictor variable.
Success ~ Subject.
Now this is where I get tangled up; I have one sided predictor variables, and what I mean by that is there is a subject that scored all of their baskets, and a subject that scored none of their's. How do we handle this type of logistic regression in r? Or can you suggest another method?
Thanks a ton!

Related

MICE not imputing all variables with missing values

I'm struggling to get mice to impute all the variables with missing values in my dataset. It's working perfectly for 4 of the variables, but not 3 others (and I'm getting the 3 logged events, which I suspect correspond to the 3 in question: GCSPupils, Hypoxia, Hypotension), but I can't figure out the issue. There seems to be variability in those variables (not constants), so mice should work. I want to do single imputation of 7 variables (the other variables have complete data).
# We run the mice code with 0 iterations
imp <- mice(TXAIMPACT_final, maxit = 0)
# Extract predictor Matrix and methods of imputation
predM <- imp$predictorMatrix
meth <- imp$method
#Setting values of variables I'd like to leave out to 0 in the predictor matrix
predM[,c("subjectId")] <- 0
# Specify a separate imputation model for variables of interest
# Dichotomous variable
log <- c("Hypotension", "Hypoxia")
# Unordered categorical variable
poly2 <- c("GCSPupils", "GCSMotor")
# Turn their methods matrix into the specified imputation models
meth[log] <- "logreg"
meth[poly2] <- "polyreg"
Here, I check to make sure "meth" is correct, and it is:
meth
subjectId Age GCS GCSMotor GCSPupils Glucose Hemoglobin
"" "" "" "polyreg" "polyreg" "pmm" "pmm"
Hypotension Hypoxia MarshallCT SAH EDH GOS GFAP
"logreg" "logreg" "pmm" "" "" "" ""
The methods are all correct as I specified. I do notice something funny about the Predictor Matrix, which is that the 3 variables not imputing only show "0" for their columns:
predM
subjectId Age GCS GCSMotor GCSPupils Glucose Hemoglobin Hypotension Hypoxia MarshallCT SAH EDH GOS GFAP
subjectId 0 1 1 1 0 1 1 0 0 1 1 1 1 1
Age 0 0 1 1 0 1 1 0 0 1 1 1 1 1
GCS 0 1 0 1 0 1 1 0 0 1 1 1 1 1
GCSMotor 0 1 1 0 0 1 1 0 0 1 1 1 1 1
GCSPupils 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Glucose 0 1 1 1 0 0 1 0 0 1 1 1 1 1
Hemoglobin 0 1 1 1 0 1 0 0 0 1 1 1 1 1
Hypotension 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Hypoxia 0 0 0 0 0 0 0 0 0 0 0 0 0 0
MarshallCT 0 1 1 1 0 1 1 0 0 0 1 1 1 1
SAH 0 1 1 1 0 1 1 0 0 1 0 1 1 1
EDH 0 1 1 1 0 1 1 0 0 1 1 0 1 1
GOS 0 1 1 1 0 1 1 0 0 1 1 1 0 1
GFAP 0 1 1 1 0 1 1 0 0 1 1 1 1 0
I think this is the problem, but I'm not sure how to solve. Finally, here is my single imputation:
imp2 <- complete(mice(TXAIMPACT_final, maxit = 1,
+ predictorMatrix = predM,
+ method = meth, print = TRUE))
iter imp variable
1 1 GCSMotor Glucose Hemoglobin MarshallCT
1 2 GCSMotor Glucose Hemoglobin MarshallCT
1 3 GCSMotor Glucose Hemoglobin MarshallCT
1 4 GCSMotor Glucose Hemoglobin MarshallCT
1 5 GCSMotor Glucose Hemoglobin MarshallCT
Warning: Number of logged events: 3
Thanks in advance!
Figured it out--posting here in case someone else has this issue. My variables that were not imputing were stored as character classes, which blocked imputation. As soon as I switched them to numeric, my issues disappeared.

R design.matrix issue -- dropped column in design matrix?

I'm having an odd problem while trying to set up a design matrix to do downstream pairwise differential expression analysis on RNAseq data.
For the design matrix, I have both the donor information and each condition:
group<-factor(y$samples$group) #44 samples, 6 different conditions
sample<-factor(y$samples$samples) #44 samples, 11 different donors.
design<- model.matrix(~0+sample+group)
head(design)
Donor11.CD8 Donor12.CD8 Donor14.CD8 Donor15.CD8 Donor16.CD8
1 1 0 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 1 0 0 0 0
5 1 0 0 0 0
6 1 0 0 0 0
Donor17.CD8 Donor18.CD8 Donor19.CD8 Donor20.CD8 Donor3.CD8
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
6 0 0 0 0 0
Donor4.CD8 Treatment2 Treatment3 Treatment4 Treatment5
1 0 0 0 0 0
2 0 0 0 0 1
3 0 0 0 1 0
4 0 0 0 0 0
5 0 0 1 0 0
6 0 1 0 0 0
Treatment6
1 1
2 0
3 0
4 0
5 0
6 0
>
The issue is that I seem to be losing a condition (treatment 1) when I form the design matrix, and I'm not sure why.
Many thanks, in advance, for your help!
That's not a problem. Treatment 1 is indicated by all 0 for the columns in the design matrix. Look at row 4 - zero for Treatments 2 through 6. That means it is Treatment 1. This is called a "treatment contrast" because the coefficients in the model contrast the named treatment against the "base" level, in this case the base level is Treatment1.

Nonlinear regression in R with multiple categorical dependent variables

I have to perform a nonlinear multiple regression with data that looks like the following:
ID Customer Country Industry Machine-type Service hours**
1 A China mass A1 120
2 B Europe customized A2 400
3 C US mass A1 60
4 D Rus mass A3 250
5 A China mass A2 480
6 B Europe customized A1 300
7 C US mass A4 250
8 D Rus customized A2 260
9 A China Customized A2 310
10 B Europe mass A1 110
11 C US Customized A4 40
12 D Rus customized A2 80
Dependent variable: Service hours
Independent variables: Customer, Country, Industry, Machine type
I did a linear regression, but because the assumption of linearity does not hold I have to perform a nonlinear regression.
I know nonlinear regression can be done with the nls function. How do I add the categorical variables to the nonlinear regression so that I get the statistical summary in R?
Column names after adding dummies: table with dummies
ID Customer.a Customer.b Customer.c Customer.d Country.China Country.Europe Country.Rus Country.US Industry.customized industry.Customized Industry.mass Machine type.A1 Machine type.A2 Machine type.A3 Service hours
1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 120
2 0 1 0 0 0 1 0 0 1 0 0 0 1 0 400
3 0 0 1 0 0 0 0 1 0 0 1 0 0 1 60
4 0 0 0 1 0 0 1 0 0 0 1 1 0 0 250
5 1 0 0 0 1 0 0 0 1 0 0 0 0 1 480
6 0 1 0 0 0 1 0 0 0 1 0 1 0 0 300
7 0 0 1 0 0 0 0 1 0 0 1 0 0 1 250
8 0 0 0 1 0 0 1 0 1 0 0 0 1 0 260
9 1 0 0 0 1 0 0 0 0 0 1 0 1 0 210
10 0 1 0 0 0 1 0 0 1 0 0 0 1 0 110
11 0 0 1 0 0 0 0 1 0 0 1 0 0 1 40
12 0 0 0 1 0 0 1 0 0 0 1 1 0 0 80
The way to handle categorical predictors is dependent on the number of levels the predictor can hold.
For predictors such as gender which can only take 2 forms (male or female), you can simply represent them as a binary (1,0) variable.
For predictors with greater than 2 levels, we use 1-of-k dummy encoding where k is the number of levels the particular variable takes. See the dummies package for useful functions!
After this, you can fit the model using formula:
nls(Service.hours ~ predictor1 + predictor2 + predictorN, data = df)

R mlogit throws Error in solve.default(H, g[!fixed]):system is computationally singular: reciprocal condition number

I'm trying to do discrete choice modeling on the below data. Basically, 30 customers have 16 different choices of pizza. They can choose more than 1 type of pizza and the ones they choose is indicated by choice variable.
pizza cust choice pan thin pineapple veggie sausage romano mozarella oz
1 1 Cust1 0 1 0 1 0 0 1 0 1
2 2 Cust1 1 0 1 1 0 0 0 0 0
3 3 Cust1 0 0 0 1 0 0 0 1 1
4 4 Cust1 1 0 1 1 0 0 0 0 0
5 5 Cust1 1 1 0 0 1 0 0 0 1
6 6 Cust1 0 0 1 0 1 0 1 0 0
7 7 Cust1 0 0 0 0 1 0 0 0 1
8 8 Cust1 1 0 1 0 1 0 0 1 0
9 9 Cust1 0 1 0 0 0 1 0 1 0
10 10 Cust1 1 0 1 0 0 1 0 0 1
11 11 Cust1 0 0 0 0 0 1 1 0 0
12 12 Cust1 0 0 1 0 0 1 0 0 1
13 13 Cust1 0 1 0 0 0 0 0 0 0
14 14 Cust1 1 0 1 0 0 0 0 1 1
15 15 Cust1 0 0 0 0 0 0 0 0 0
16 16 Cust1 0 0 1 0 0 0 1 0 1
17 1 Cust10 0 1 0 1 0 0 1 0 1
18 2 Cust10 0 0 1 1 0 0 0 0 0
19 3 Cust10 0 0 0 1 0 0 0 1 1
20 4 Cust10 0 0 1 1 0 0 0 0 0
When I use the below command to transform my data. I tried making few changes here like adding chid.var = "chid" and alt.levels=c(1:16). If I use both alt.levels and alt.var it gives me an error saying pizza already exists and will be replaced. However, I get no error if I use either of them.
pz <- mlogit.data(pizza,shape = "long",choice = "choice",
varying = 4:8, id = "cust", alt.var = "pizza")
Finally, when I use the mlogit command, I get this error.
mlogit(choice ~ pan + thin + pineapple + veggie + sausage + romano + mozarella + oz, pz)
Error in solve.default(H, g[!fixed]) :
system is computationally singular: reciprocal condition number = 8.23306e-19
This is my first post on stackoverflow. I visit this site very often and so far never needed to post as I found solutions already. I went through almost all similar posts like this one but in vain. I'm new to discrete choice modeling so I don't know if I'm making any fundamental mistake here.
Also, I'm not really sure what chid.var does.
Couldn't solve this problem. Though you can use multinom function from nnet package. It seems to work. Verified the answer.
The dataset remains the same as shown in the question so no need for any transformation
library("nnet")
pizza_model <- multinom(choice ~ Price + IsThin + IsPan ,data=pizza_all)
summary(pizza_model)
where choice is a dependent categorical variable which you want to predict. Price, IsThin, and IsPan are independent variables. Below is the output
Call:
multinom(formula = choice ~ Price + I_cPan + I_cThin, data = pizza_all)
Coefficients:
Values Std. Err.
(Intercept) 0.007192623 1.3298018
Price -0.149665357 0.1464976
I_cPan 0.098438084 0.3138538
I_cThin 0.624447867 0.2637110
Residual Deviance: 553.8519
AIC: 561.8519

Permutation position of numbers in R

I'm looking for a function in R which can do the permutation. For example, I have a vector with five 1 and ten 0 like this:
> status=c(rep(1,5),rep(0,10))
> status
[1] 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
Now I'd like to randomly permute the position of these numbers but keep the same number of 0 and 1 in vector and to get new series of number, for example to get something like this:
1 1 0 1 0 1 0 0 0 0 0 1 0 0 0
or
1 0 0 0 0 0 0 1 1 0 0 1 0 1 0
I found the function sample() can help us to sample, but the number of 1 and 0 is not the same each time. Do you know how can I do this with R? Thanks in advance.
We can use sample
sample(status)
#[1] 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0
sample(status)
#[1] 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0
If we use sample to return the entire vector, it will do the permutation and give the frequency count same for each of the unique elements
colSums(replicate(5, sample(status)))
#[1] 5 5 5 5 5
i.e. we get 5 one's in each of the sampling. So, the remaining 0's would be 10.

Resources