How to validate and train a multivariate multiple regression modeling multiple responses? - r

I would like to validate the accuracy of my Multivariate Multiple Regression model (MMR) and train.
For this MMR I have used four dependent variables (y1, y2, y3, y4) and two independent variables (x1 and x2)
Firstly, I worked with R function lm() such as:
modelo_MMR <- lm(cbind(y1,y2,y3,y4) ~ x1 + x2, data = my_data))
I have checked that the previous function could be divided such as:
m1 <- lm(y1 ~ x1 + x2, data = my_data)
m2 <- lm(y2 ~ x1 + x2, data = my_data)
m3 <- lm(y3 ~ x1 + x2, data = my_data)
m4 <- lm(y4 ~ x1 + x2, data = my_data)
To predict y1, y2, y3, and y4 with x1 and x2 values, I have used predict() function and I have obtained the same results using:
nd <- data.frame(x1 = 9, x2 = 40)
p_MMR <- predict(modelo_MMR, nd)
than using:
p_m1 <- predict(m1, nd)
p_m2 <- predict(m2, nd)
p_m3 <- predict(m3, nd)
p_m4 <- predict(m4, nd)
When I use lm(cbind(y1,y2,y3,y4) ~ x1 + x2, data = my_data)) in some scripts to validate the model shows me several errors because those scripts usually use lm(), glm(), etc. using one response variable instead of four… I was thinking about to validate each multiple linear regression (p_m1, p_m2, p_m3 and p_m4) separately using the same training-set and test-set for cross validation. Also, I was thinking about using different machine learning models with each multiple linear regression because instead of MMR because of I have not find yet information about how to train MMR using Naives, Randomforest, K-NN, etc
Anyone could suggest me what I could do to train a MMR model and validate its accuracy?
Thanks

Related

Use linear model for propensity score matching (R MatchIt)

I want to use the propensity score fitted from a linear model to match observations using the MatchIt library.
For example, suppose df is a dataframe. If I were to balance its treatment column in terms of x1, x2 and x3 using the propensity score from a logit, I would set distance = 'glm' and link = 'logit':
m <- matchit(formula = treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = 'glm',
link = 'logit')
How can I do the same with a linear model instead of a logit?
As per the documentation:
When link is prepended by "linear.", the linear predictor is used instead of the predicted probabilities.
Hence, I tried:
m <- matchit(formula = treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = 'glm',
link = 'linear.logit')
I'm afraid that doing this (link = linear.logit) would use the score from the log-odds of the logit model.
Is there any way I can just use a linear model instead of a generalized linear model?
You cannot do this from within matchit() (and you shouldn't, in general, which is why it is not allowed). But you can always estimate propensity score outside matchit() however you want and the supply them to matchit(). To use a linear probability model, you would run the following:
fit <- lm(treatment ~ x1 + x2 + x3, data = df)
ps <- fit$fitted
m <- matchit(treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = ps)

How do I run a regression with different variables in R using a loop?

I want to estimate the following regression models:
y <- rnorm(1:100)
x1 <- 1:100
x2 <- 1:100
x3 <- 1:100
my_data <- data.frame(cbind(y, x1, x2, x3))
m1 <- lm(y ~ x1, data = my_data)
m2 <- lm(y ~ x2, data = my_data)
m3 <- lm(y ~ x3, data = my_data)
I want to run several models like this, using the same dataset and a different independent variable in each model. How can I use a loop to run each model?
This can be easily accomplished using the apply() function:
data_to_analyse <- my_data[, -1] # leave out the y column for this analysis
out <- apply(data_to_analyse, 2, function(current_col){
lm.out <- lm(my_data$y ~ current_col)
})
This takes each column of the data.frame ("current_col") and then performs linear regression. The output of the function is a list containing the results of each regression, each entry named by the variable name.

In a glm model, how to get independent variable results, given values for remaining independent and dependent variables?

I created a glm model in R, with 3 independent variables and a binary dependant one:
model <- glm(y ~ x1 + x2 + x3, data = train, family = binomial)
This is great to predict the outcome, given specific values for the independent variables, with a simple formula:
newdata <- data.frame(x1 = 10, x2 = 8, x3 = 5)
predict(model, newdata, type = "response").
But what about finding out x1, given that we want:
y = 1
x2 = 10
x3 = 2
I've heard a for loop is a good option to get a list with possible value combinations. But don't know how to code it.
Does any one know?
Or is it any built in function to solve this issue?
Think I've got it:
x <- seq(1,100,by=1)
r <- vector(mode="numeric", length=100)
for (x1 in v) {
r[x1] <- 1/(1+exp(-(CoefX1*X1+CoeffX2*10+CoeffX3*2)))
}
cbind(x,r)

R : Plotting Prediction Results for a multiple regression

I want to observe the effect of a treatment variable on my outcome Y. I did a multiple regression: fit <- lm (Y ~ x1 + x2 + x3). x1 is the treatment variable and x2, x3 are the control variables. I used the predict function holding x2 and x3 to their means. I plotted this predict function.
Now I would like to add a line to my plot similar to a simple regression abline but I do not know how to do this.
I think I have to use line(x,y) where y = predict and x is a sequence of values for my variable x1. But R tells me the lengths of y and x differ.
I think you are looking for termplot:
## simulate some data
set.seed(0)
x1 <- runif(100)
x2 <- runif(100)
x3 <- runif(100)
y <- cbind(1,x1,x2,x3) %*% runif(4) + rnorm(100, sd = 0.1)
## fit a model
fit <- lm(y ~ x1 + x2 + x3)
termplot(fit, se = TRUE, terms = "x1")
termplot uses predict.lm(, type = "terms") for term-wise prediction. If a model has intercept (like above), predict.lm will centre each term (What does predict.glm(, type=“terms”) actually do?). In this way, each terms is predicted to be 0 at the mean of the covariate, and the standard error at the mean is 0 (hence the confidence interval intersects the line at the mean).

Subset of predictors using coefplot()

I'd like to do a plot of coefficients using coefplot() that only takes into account a subset of the predictors that I'm using. For example, if you have the code
y1 <- rnorm(1000,50,23)
x1 <- rnorm(1000,50,2)
x2 <- rbinom(1000,1,prob=0.63)
x3 <- rpois(1000, 2)
fit1 <- lm(y1 ~ x1 + x2 + x3)
and then ran
coefplot(fit1)
it would give you a plot displaying the coefficients of the intercept, x1, x2 and x3. How can I modify this so I only get the coefficients for say, x1 and x2?
You can use the argument predictors and it will only plot the coefficients you need:
library(coefplot)
coefplot(fit1, predictors=c('x1','x2'))
Output:

Resources