clogitL1 - extract regression coefficients - r

I'm an R newbie. I'm using "clogitL1" to run a regularized conditional logistic regression for a matched case-control study with 1021 independent variables (metabolites). I'm not able to extract the regression coefficients. I've tried summary(x), coef(x), coefficient(x), x$beta - none of them work. I'm able to run it OK, and if I follow it with "cv.clogitL1" I can extract the cross-validated estimated coefficients, but not the estimated coefficients for the original model. Here's some of my code:
strata=sort(data.meta$MATCHED_NEW)
condlog <- clogitL1(y=data.meta$BCR, x=data.meta$ln_metab[, data.features ], strata,
numLambda=100, minLambdaRatio=0.000001, alpha = 1.0)
"strata" is a vector indicating pairing of cases & controls.
"data.meta$BCR" is a vector inicating case or control status
"data.meta$ln_metab" is a matrix with observations as rows and metabolite levels as columns
"data.features" is a vector indicating which metabolites passed several dimension reduction filters.
Appreciate any suggestions.

Related

Fit multiple linear regression without an intercept with the function lm() in R

can you please help with this question in R, i need to get more than one predictor:
Fit multiple linear regression without an intercept with the function lm() to train data
using variable (y.train) as a goal variable and variables (X.mat.train) as
predictors. Look at the vector of estimated coefficients of the model and compare it with
the vector of ’true’ values beta.vec graphically
(Tip: build a plot for the differences of the absolute values of estimated and true values).
i have already tried it out with a code i will post at the end but it give me only one predictor but in this example i need to get more than one predictor:
and i think the wrong one is the first line but i couldn't find a way to fix it :
i can't put the data set here it's large but i have a variable that stores 190 observation from a victor (y.train) and another value that stores 190 observation from a matrix (X.mat.trian).. should give more than one predictor but for me it's giving one..
simple.fit = lm(y.train~0+ X.mat.train) #goal var #intercept # predictor
summary(simple.fit)# Showing the linear regression output
plot(simple.fit)
abline(simple.fit)
n <- summary(simple.fit)$coefficients
estimated_coeff <- n[ , 1]
estimated_coeff
plot(estimated_coeff)
#Coefficients: X.mat.train 0.5018
v <- sum(beta.vec)
#0.5369
plot(beta.vec)
plot(beta.vec, simple.fit)

mnrval Matlab function in R

Can someone please help me with the equivalent of the mnrval function in R? I have not been able to find one where predicted probabilities are returned based on arguments, coefficient estimates and predictor values. I tried to rewrite the Matlab function in R but was unable to because one of the inner functions that was used was private. I would highly appreciate your help on this.
The documentation page on mnrval() states
MNRVAL Predict values for a nominal or ordinal multinomial regression model.
PHAT = MNRVAL(B,X) computes predicted probabilities for the nominal
multinomial logistic regression model with predictor values X. B is the
intercept and coefficient estimates as returned by the MNRFIT function. X
is an N-by-P design matrix with N observations on P predictor variables.
MNRVAL automatically includes intercept (constant) terms in the model; do
not enter a column of ones directly into X. PHAT is an N-by-K matrix of
predicted probabilities for each multinomial category.

Logistic regression without any outcome data

I am trying to perform logistic regression on data that contains a binary outcome. However, I do not have access to the outcome data.
I've calculated probabilities of a "1" outcome for each subject by assigning "risk points" to certain values of each variable and adding them up for each subject, so that the probability of a "1" is (sum of subject's risk points) / (total number of possible risk points). I then took the log of the odds ratio to calculate the logit, so I have a list of logit values between -3 and 2 for each subject.
However, I would like to use logistic regression to evaluate which variables have the greatest effect on the outcome probabilities. Is there a way in R to perform logistic regression using only the predictive variables and logit, without the binary outcome data? I have tried using glm and it does not work, because in order to do logisitic regression you need binary outcome data.
Thank you!

beta coefficient in Anova with R and XLStat

I’m working with the software R and XLStat. I’ve conducted an one-way ANOVA (my categorical variable is 3 modal (1,2,3) and my response variable is quantitative on scale 1-10).
I’ve conducted this ANOVA on R and XLStat and the outputs for the F fisher, p-value, coefficient estimations, t-values, std error … are exactly the same.
However, XLstat offers an extra output : the standardized coefficients (called too beta coefficients). Firstly, I was surprised, because I didn’t think we could calculate beta coefficient for a categorical variable and according to the bibliography I read, it doesn’t have any sense.
Anyway, I tried to find these coefficients with R, thanks to the unique formula I found : beta = estimate * sd(x)/sd(y). sd(x) being the standard deviation of the categorical variable (which is automatically transformed as numeric variable with R, in order to calculate sd(x), seems logical ) and sd(y) being the standard deviation of my response variable.
The first beta I obtained with R is the same than in XLstat , but not the second and the third. Given that the first one is the same with R and XLStat, I suppose that Xlstat convert too the categorical variable in numeric variable (which is senseless but this is not the question).
Moreover, I conducted the anova on Statistica in order to see if XLStat did any mistake but its outputs for the beta coefficients are the same than in Xlstat …
So, my question is this one : what is the formula to obtain the beta coefficient in a one way Anova ?
Then, I would like to ask you about the relevance of these beta coefficients for a categorical variable. According to my thoughts and publications I read, it doesn't have any sense …
ps contrasts in R and Xlstat are sum(ai)=0. For beta coefficients, XLStat remove the intercept. I guess this fact could be important but I don't know somehow
The formula for obtaining beta coefficients from metric coefficients for an ANOVA is the same as for a linear regression. The coefficients have no sensible interpretation (for categorical variables), but standardized coefficients are useful in comparing the relative effects of IVs with different metrics.
In R, either use scale() to transform the data to z-scores before fitting the model, or use lm.beta() instead of lm().
It is not clear why you would obtain different beta coefficients with XLStat, but it could have something to do with degrees of freedom if it's not an error. This example compares the lm.beta() function in R with SAS and obtains the same coefficients.

Multiple Linear Regression and MSE from R

have a dataset (found here- https://netfiles.umn.edu/users/nacht001/www/nachtsheim/Kutner/Appendix%20C%20Data%20Sets/APPENC01.txt) and I have done some R coding for linear regression. In the attached dataset the columns are not labeled. I had to label the columns of the dataset and save it as a csv and I apologize I can't get that on here… but the columns I am using are column 3(age) column 4(infection) column 5 (culratio) column 10 (census) and column 12(service), column 9 (region). I named the dataset hospital.
I am supposed to "For each geographic region, regress infection risk (Y) against the predictor variables age, culratio, census, service using a first order regression model. Then I need to find the MSE for each region. This is the code I have.
NE<- subset(hospital, region=="1")
NC<- subset(hospital, region=="2")
S<- subset(hospital, region=="3")
W<- subset(hospital, region=="4")
then to do a first order linear regression model I use the basic code for each
NE.Model<- lm(NE$infection~ NE$age + NE$culratio + NE$census + NE$service)
summary(NE.Model)
and I can get the adjusted R squared value, but how do I find MSE from this output?
Moving my comment to an answer. The "errors" or "residuals" are part of the model object, NE.Model$residuals, so getting the mean square error is as easy as that: mean(NE.Model$residuals^2).
Just as a note, you could do this in fewer steps by fitting a region fixed effect term in your model and then calculating the MSE for each subset of the residuals. Same difference, really.

Resources