I'm playing around with linear regression in Julia using the GLM package. I am interested in getting the model matrix out from the LM object so I can calculate leverage values (diagonals of the hat matrix), but can't find any function to do this. The equivalent in R is model.matrix.
Any suggestions?
I guess I could just do the regression manually via matrix multiplication, but didn't want to reinvent the wheel.
Just figured this out, by calling names(OLS) on the LM object (which I am calling OLS). Then the model matrix can be extracted by OLS.mm.
If the data you're using is a DataFrame you can use the following:
using DataFrames, GLM
dat = dataset("car","Vocab")
x = ModelMatrix(ModelFrame(Vocabulary~Year+Sex+Education,dat)).m
Related
Assume x2-x4 are continuous predictors occupying one column each in the design matrix created using lm() in R. I want to include x1 a categorical variable that has 3 levels.
Regression R code:
fit <- lm(y~as.factor(x1)+x2+x3+c4,data=mydata)
How can I print the design matrix from lm() in R and what would it look like? I need to know the default coding used in R so I can write contrast statements properly.
I think the model.matrix() function is what you're after.
As kjetil b halvorsen says you can specify the model in the argument. Or you can just hand model.matrix the defined model.
fit <- lm(y~as.factor(x1)+x2+x3+c4,data=mydata)
model.matrix(fit)
You can even get a design matrix for new data: model.matrix(fit, data=newdata)
The comment by kjetil b halvorsen helped me the most:
call res <- lm() with the argument x=TRUE then the design matrix will be returned in the model object res Then call str(res) to see the structure of res, and you will now how to get the design matrix from it. But easier is to call model.matrix(y ~ x + f, data=...) with the same model formula you use in lm.
I want to calculate a Seemingly Unrelated Regression model. But I don't wan't to code every single regression equation in R. I wan't to fill a matrix or a list with the equations in a loop. I thought in this way: for (n in ..)
SURmatrix[n,1]<-lm(esfirm.w[,n]~esmarkt.w[,n])
and the fit the model in this way:
fitsur <- systemfit(SURmatrix[,1])
but the problem is, that the type of "formula" isn't compatible with a matrix.
has anyone an idea to solve this problem?
I was wondering if it is possible to predict with the plm function from the plm package in R for a new dataset of predicting variables. I have create a model object using:
model <- plm(formula, data, index, model = 'pooling')
Now I'm hoping to predict a dependent variable from a new dataset which has not been used in the estimation of the model. I can do it through using the coefficients from the model object like this:
col_idx <- c(...)
df <- cbind(rep(1, nrow(df)), df[(1:ncol(df))[-col_idx]])
fitted_values <- as.matrix(df) %*% as.matrix(model_object$coefficients)
Such that I first define index columns used in the model and dropped columns due to collinearity in col_idx and subsequently construct a matrix of data which needs to be multiplied by the coefficients from the model. However, I can see errors occuring much easier with the manual dropping of columns.
A function designed to do this would make the code a lot more readable I guess. I have also found the pmodel.response() function but I can only get this to work for the dataset which has been used in predicting the actual model object.
Any help would be appreciated!
I wrote a function (predict.out.plm) to do out of sample predictions after estimating First Differences or Fixed Effects models with plm.
The function is posted here:
https://stackoverflow.com/a/44185441/2409896
I'm currently going through the 'Introduction to Statistical Learning' MOOC by Stanford OpenX. In one of the lab exercises, it suggests creating a model matrix from the test data by explicitly using model.matrix().
Extract from textbook
We now compute the validation set error for the best model of each model size. We first make a model matrix from the test data.
test.mat=model.matrix (Salary∼.,data=Hitters [test ,])
The model.matrix() function is used in many regression packages for
building an X matrix from data. Now we run a loop, and for each size i, we
extract the coefficients from regfit.best for the best model of that
size, multiply them into the appropriate columns of the test model
matrix to form the predictions, and compute the test MSE.
val.errors =rep(NA ,19)
for(i in 1:19){
coefi=coef(regfit .best ,id=i)
pred=test.mat [,names(coefi)]%*% coefi
val.errors [i]= mean(( Hitters$Salary[test]-pred)^2)
}
I understand that model.matrix would convert string variables into values with different levels, and that models like lm() would do the conversions under the hood.
However, what are the instances that we would explicitly use model.matrix(), and why?
I am trying to extract weights for R's ksvm package.
Usually I use the e1071 package and the weights can be computed by
weights = t(svmmodel$coefs) %*% svmmodel$SV
However, when I look into the ksvm package, both the coefficients and alphas (support vectors) are lists of the same dimension. The alphas do not return vectors.
My question is, how should I access the support vectors including the zero values? Would I have to use SVindex to correspond the variables back to the original input?
Thanks.
Use xmatrix in the ksvm model (https://www.rdocumentation.org/packages/kernlab/versions/0.9-29/topics/ksvm-class). The xmatrix slot of the ksvm model svmmodel can be accessed using svmmodel#xmatrix.