How can the multivariate linear regression be linear in nature? - math

As per my limited knowledge, linear functions have only two variables which define it, namely x and y.
However, as per multivariate linear regression,
h(x)=(theta transpose vector)*(x vector)
where theta transpose vector = (n+1)x1 vector of parameters
x vector = input variables x0, x1, x2 ....., xn
There are multiple variables involved. Does it not change the nature of the graph and consequently the nature of the function itself?

linear functions have only two variables which define it, namely x and y
This is not accurate; the definition of a linear function is a function that is linear in its independent variables.
What you refer to is simply the special case of only one independent variable x, where
y = a*x + b
and the plot in the (x, y) axes is a straight line, hence the historical origin of the term "linear" itself.
In the general case of k independent variables x1, x2, ..., xk, the linear function equation is written as
y = a1*x1 + a2*x2 + ... + ak*xk + b
whose form you can actually recognize immediately as the same with the multiple linear regression equation.
Notice that your use of the term multivariate is also wrong - you actually mean multivariable, i.e. multiple independent variables (x's); the first term means multiple dependent variables (y's):
Note that multivariate regression is distinct from multivariable
regression, which has only one dependent variable.
(source)

Related

get pairwise difference from emmeans with quadratic covariate interaction

I have a factor X with three levels and a continuous covariate Z.
To predict the continuous variable Y, I have the model
model<-lm(Y ~ X*poly(Z,2,raw=TRUE))
I know that the emmeans package in R has the function emtrends() to estimate the pairwise difference between factor level slopes and does a p-value adjustment.
emtrends(model, pairwise ~ X, var = "Z")
however this works when Z is a linear term. Here I have a quadratic term. I guess this means I have to look at pairwise differences at pre specified values of Z? and get something like the local "slope" trend?
Is this possible to do with emmeans? How would I need to do the p-adjustment, does it scale with the number of grid points? -so when the number of grid values where I do the comparison increases, bonferroni will become too conservative?
Also how would I do the pairwise comparison of the mean (prediction) at different grid values with emmeans (or is this the same regardless of using poly() as this relies only on model predicitons)?
thanks.

Combining two polynomial equations with different degrees in r

I have two polynomial regression lines
v=lm(game_rating~poly(votes,2),data=board_games)
t=lm(game_rating~poly(timeplay,4),data=board_games)
Now the question is how to combine these two lines into one to get a new regression game_rating=f(votes,timeplay). What can I do to add them together?
I tried to add them using "+" but r shows up in error that non-numeric argument to binary operator.
vt=lm(game_rating~poly(votes,2),data=board_games)+lm(game_rating~poly(timeplay,4),data=board_games)
*Notes: regression line 1 is between predictor games_rating and variable votes and polynomial with degree 2 is the best line that can make the prediction. Same for line 2.
adding two linear models probably won't get you what you really aim to get from combining both models, what you need is to run one model with both variables.
Assuming I understand what you want to achieve with adding these models,
Instead of creating a model to predict Y ~ x and Y ~ z in independent models and then adding them, you should run one model :
Y ~ x + y
In your specific case:
lm(data = board_games, game_rating ~ poly(votes,2) + poly(timeplay,4))

Finding assigned importance to variable inside Prophet model?

I am building datasets and training unique models for combinations of x1, x2, x3. Think:
prophet1 <- fit.prophet(data.frame(ds, y, x1))
prophet2 <- fit.prophet(data.frame(ds, y, x2, x3))
prophet3 <- fit.prophet(data.frame(ds, y, x3))
I am then setting x1, x2, x3 to zero for each of the models, and evaluating its effect on y had that variable not been introduced. My question is- there any way to tell from the model object whether x1 in prophet1 contributed more than x2+x3 in prophet2 without explictly predicting the dataframe? i.e.- can we tell whether setting x1 to zero changes y more than x2+x3 to zero does by just looking at the generated model? Does x1 have a higher regression coefficient than x2+x3 and as such- change y more?
I was digging around and found this:
model$param$k; // Base trend growth rate
model$param$m; // Trend offset
model$param$sigma_obs; // Observation noise
model$param$beta; // Regressor coefficients
Source: https://github.com/facebook/prophet/issues/501
If I were to place x1, x2, and x3 in the same dataframe and evaluate y, I can evaluate this coefficient by looking at the beta values. However- I don't know how to find this out if they are in seperate dataframes across different models.
But plotting the sum(beta), k, m, or sigma_obs against difference between y and predictions had the variable set to zero did not yield me any relationship at all. Is it possible to extract out how important the variables used to model y from a prophet model are/ whether Prophet believes the effect is positive/negative? If so; how can I do so?

Quadratic GLM in R with interactions?

So I have a question of utilizing quadratic (second order) predictors with GLMs in R. Basically I have three predictor variables (x, y, z) and a response variable (let's call it ozone).
X, Y, and Z are not pquadratic predictors yet so I square them
X2<- x^2 (same for y and z)
Now I understand that if I wanted to model ozone based off of these predictor variables I would use the poly() or polym() function
However, when it comes to using interaction terms between these three variables...that's where I get lost. For example, if i wanted to model the interaction between the quadratic predictors of X and Y I believe I would be typing in something like this
ozone<- x+ x2 + y+y2+ x*y +x2*y + x*y2 + x2*y2 + x*y (I hope this is right)
My question is, is there an easier way of inputting this (with three terms that's a lot of typing). My other question is why does the quadratic predictor flip signs in the coefficients? When I just run the predictor variable X the coefficient is positive but when I use a quadratic predictor the coefficient almost always ends up being negative.

obtain the probability equation represented by plotmo plots

I want to obtain the equations of the probability functions represented by plotmo (R). This is the equations of the model when varying one or two predictors while holding the other predictors constant in the mean value. I want an easy way to obtain the mathematical equation because a must to make to many models with different variables.
if my model is like this:
glm(formula = pres_aus ~ pH_sp + Annual_prec + I(pH_sp^2) + I(Annual_prec^2), family = binomial(link = "logit"), data = puntos_calibrado)
how can i make it?
No data example provided, so no testing done, but couldn't you just skip the construction of a symbolic expression and do something along the lines of:
model.matrix(data.frame(one=1, dat) ) %*% coef(mdl.fit)
# where mdl.fit is returned from glm()
In a sense this is the R matrix representation of the formula: sum( beta_i*X_1). If you want to specify a mean value for a particular column then just pull that dataframe apart and use only parts of it for a calculation. So for the first column held at the mean:
model.matrix(data.frame(one=1, mn1 =mean(dat[[1]]), dat[-1]) ) %*%
coef(mdl.fit)

Resources