I would like to use in R the following expression given in Matlab:
y1=polyval(p,end_v);
where p in Matlab is:
p = polyfit(Nodes_2,CInt_interp,3);
Right now in R I have:
p <- lm(Spectra_BIR$y ~ poly(Spectra_BIR$x,3, raw=TRUE))
But I do not know which command in R corresponds to the polyval from Matlab.
Many thanks!
r:
library(polynom)
predict(polynomial(1:3), c(5,7,9))
[1] 86 162 262
matlab (official example):
p = [3 2 1];
polyval(p,[5 7 9])
ans = 86 162 262
There is no exact equivalence in R for polyfit and polyvar, as these MATLAB routines are so primitive compared with R's statistical tool box.
In MATLAB, polyfit mainly returns polynomial regression coefficients (covariance can be obtained if required, though). polyvar takes regression coefficients p, and a set of new x values to predict the fitted polynomial.
In R, the fashion is: use lm to obtain a regression model (much broader; not restricted to polynomial regression); use summary.lm for model summary, like obtaining covariance; use predict.lm for prediction.
So here is the way to go in R:
## don't use `$` in formula; use `data` argument
fit <- lm(y ~ poly(x,3, raw=TRUE), data = Spectra_BIR)
Note, fit not only contains coefficients, but also essential components for orthogonal computation. If you want to extract coefficients, do coef(fit), or unname(coef(fit)) if you don't want names of coefficients to be shown.
Now, to predict, we do:
x.new <- rnorm(5) ## some random new `x`
## note, `predict.lm` takes a "lm" model, not coefficients
predict.lm(fit, newdata = data.frame(x = x.new))
predict.lm is much much more powerful than polyvar. It can return confidence interval. Have a read on ?predict.lm.
There are a few sensitive issues with the use of predict.lm. There have been countless questions / answers regarding this, and you can find the root question to which I often close those questions as duplicated:
Getting Warning: “ 'newdata' had 1 row but variables found have 32 rows” on predict.lm in R
Predict() - Maybe I'm not understanding it
So make sure you get the good habit of using lm and predict at the early stage of learning R.
Extra
It is also not difficult to construct something identical to polyvar in R. The function g in my answer Function for polynomials of arbitrary order is doing this, although by setting nderiv we can also get derivatives of the polynomial.
Related
I was searching for this answer and I'm really suprised that haven't found it. I just want to peform three level logistic regression in R.
Let's define some artificial data:
set.seed(42)
y <- sample(0:2, 100, replace = T)
x <- rnorm(100)
My variable y is containing three numbers - 0, 1 and 2. So I thought that the simplest way would be just to use:
glm(y ~ x, family = binomial("logit"))
However I got information that y should be in interval [0,1]. Do you know how I can perform this regression ?
Please notice - I know that it's not so straightforward to perform multilevel logistic regression, there are several techniques how to do so e.g. One vs all. But as I was seeking for it, I wasn't able to find any.
Logistic regression as implemented by glm only works for 2 levels of output, not 3.
The message is a little vauge because you can specify the y-variable in logistic regression as 0s and 1s, or as a proportion (between 0 and 1) with a weights argument specifying the number of subjects the proportion is of.
With 3 or more ordered levels in the response you need to use a generalization, one common generalization is proportional odds logistic regression (also goes by other names). The polr function in the MASS package and the lrm function in the rms package (and probably other functions in other packages) fit these types of models, but glm does not.
set.seed(42)
y <- sample(0:2, 100, replace = TRUE)
x <- rnorm(100)
multinomial regression
If you don't want to treat your responses as ordered (i.e., nominal or categorical values):
library(nnet) ## 'recommended' package, i.e. installed by default
multinom(y~x)
Results
# weights: 9 (4 variable)
initial value 109.861229
final value 104.977336
converged
Call:
multinom(formula = y ~ x)
Coefficients:
(Intercept) x
1 -0.001529465 0.29386524
2 -0.649236723 -0.01933747
Residual Deviance: 209.9547
AIC: 217.9547
Or, if your responses are ordered:
ordinal regression
MASS::polr() does proportional-odds logistic regression. (You may also be interested in the ordinal package, which has more features; it can also do multinomial models.)
library(MASS) ## also 'recommended'
polr(ordered(y)~x)
Results
Call:
polr(formula = ordered(y) ~ x)
Coefficients:
x
0.06411137
Intercepts:
0|1 1|2
-0.4102819 1.3218487
Residual Deviance: 212.165
AIC: 218.165
If you read the error message, it offers a hint that you might get success with:
y <- sample(seq(0,1,length=3), 100, replace = T)
And in fact, you do. Now you challenge might be to interpret that in the context of the actual situation in reality (which you have not offered.) You do get a warning, but R warnings are not errors.
You might also look up the topic of polychotomous logistic regression, which is implemented in several variants that might be useful in particular situations. Frank Harrell's book Regression Modeling Strategies has material on such techniques. You may also post further questions on CrossValidated.com if you need help choosing which route to go.
I am working with a data with 900,000 observations. There is a categorical variable x with 966 unique value that needs to be used as fixed effects. I am including fixed effects using factor(x) in the regression. It gives me an error like this
Error: cannot allocate vector of size 6.9 Gb
How to fix this error? or do I need to do something different in the regression for fixed effects?
Then, how do I run a regression like this:
rlm(y~x+ factor(fe), data=pd)
The set of dummy variables constructed from a factor has very low information content. For example, considering only the columns of your model matrix corresponding to your 966-level categorical predictor, each row contains exactly one 1 and 965 zeros.
Thus you can generally save a lot of memory by constructing a sparse model matrix using Matrix::sparse.model.matrix() (or MatrixModels::model.Matrix(*, sparse=TRUE) as suggested by the sparse.model.matrix documentation). However, to use this it's necessary for whatever regression machinery you're using to accept a model matrix + response vector rather than requiring a formula (for example, to do linear regression you would need sparse.model.matrix + lm.fit rather than being able to use lm).
In contrast to #RuiBarradas's estimate of 3.5Gb for a dense model matrix:
m <- Matrix::sparse.model.matrix(~x,
data=data.frame(x=factor(sample(1:966,size=9e5,replace=TRUE))))
format(object.size(m),"Mb")
## [1] "75.6 Mb"
If you are using the rlm function from the MASS package, something like this should work:
library(Matrix)
library(MASS)
mm <- sparse.model.matrix(~x + factor(fe), data=pd)
rlm(y=pd$y, x=mm, ...)
Note that I haven't actually tested this (you didn't give a reproducible example); this should at least get you past the step of creating the model matrix, but I don't know if rlm() does any internal computations that would break and/or make the model matrix non-sparse.
I have run a 3rd order polynomial regression in R and have run the "summary" function, but I need to be able to replicate the "predict" function in Excel. I have my current working code below. Thank you for your help!
#Have access to this output:
AICFit <- lm(R60 ~ poly(M20, 3) + poly(M40, 3), data = mydata)
summary(AICFit)
#do not have access to output:
predict(AICFit,data.frame(M20=0.972375241,M40=0.989086129,interval ="prediction")
Basically, I don't have access to R when I have access to these numbers: 0.972375241,0.989086129.
I believe this is the equation that is the basis for the predict function, but I don't know how to compute this in Excel incorporating order 1, 2 and 3:
You do not have enough information from summary to calculate the prediction interval in Excel.
So the simple answer is - it is not possible to do it without an access to variance-covariance matrix (however for orthogonal polynomials in your model it is diagonal) and raw data. Moreover you need to extract orthogonal polynomial coefficients themselves, which are generated recursively and uniquely for each dataset you are fitting.
The formula you are referencing is for the univariate linear regression and it is not applicable for your case where you are doing multivariate polynomial regression for two variables: M20 and M40.
Assume x2-x4 are continuous predictors occupying one column each in the design matrix created using lm() in R. I want to include x1 a categorical variable that has 3 levels.
Regression R code:
fit <- lm(y~as.factor(x1)+x2+x3+c4,data=mydata)
How can I print the design matrix from lm() in R and what would it look like? I need to know the default coding used in R so I can write contrast statements properly.
I think the model.matrix() function is what you're after.
As kjetil b halvorsen says you can specify the model in the argument. Or you can just hand model.matrix the defined model.
fit <- lm(y~as.factor(x1)+x2+x3+c4,data=mydata)
model.matrix(fit)
You can even get a design matrix for new data: model.matrix(fit, data=newdata)
The comment by kjetil b halvorsen helped me the most:
call res <- lm() with the argument x=TRUE then the design matrix will be returned in the model object res Then call str(res) to see the structure of res, and you will now how to get the design matrix from it. But easier is to call model.matrix(y ~ x + f, data=...) with the same model formula you use in lm.
I want to compute a linear model in order to get means of some Y variable adjusted on a categorial Q variable and some X numeric variables.
One told me I could easily get them with SAS, and I used this piece of code:
proc glm data=TABLE_R;
class Q(ref="Q1");
model Y = Q X2 X3 X4 / solution;
lsmeans Q/ stderr pdiff cov out=adjmeans;
run;
But being way more friendly with R, I wanted to replicate this procedure, and after some research I ended with this code:
m = glm(Y ~ Q + X2 + X3 + X4, data=db) #using lm() didn't change anything
emmeans::emmeans(m, "Q")
The problem is that, whether very close, model coefficients are different. Here is an example with the intercept and 2 levels of Q:
#in R
(Intercept) Q2 Q3
-0.1790444126 0.0051160461 -0.0013756817
#in SAS
(Intercept) Q2 Q3
-0.1767853086 0.0016709301 -0.0031477746
Actually, in SAS, I have a message saying that coefficients needed additional computation (which I unfortunately don't understand, does R glm() lack this ?):
Note: The X'X matrix has been found to be singular, and a generalized
inverse was used to solve the normal equations. Terms whose estimates
are followed by the letter 'B' are not uniquely estimable.
Which option should I add here or ther so I can find the same results with both SAS and R ?
If I cannot, how can I choose which method is best suited ?
Usefull posts : Proc GLM (SAS) using R, X'X matrix found to be singular
EDIT : This is very strange but effectives are different in SAS and R :
#SAS
Observations read: 81733
Observations used: 9000
#R
16357 Residual
(88017 observations deleted due to missingness)
You will get the same coefficients if you first do
options(contrasts=c(“contr.SAS”,”contr.poly”))
before fitting the model. This will cause R to use the same parameterization that SAS uses.
However, even without this change, the fitted values from R will be identical to those from SAS, and the EMMs from R will match the lsmeans from SAS. That’s because we are not really changing the model, we are only changing how it is parameterized.