How to conduct joint significance test in seemingly unrelated regression - r

I'm trying to conduct a joint test of significance in a seemingly unrelated regression setup with robust standard errors. I have three outcomes Y1, Y2, and Y3 and I want to conduct a joint hypothesis test against the null that the average effect of the treatment Z is zero on all three outcomes.
I think that I have the model set up correctly, but I don't think that I have the hypothesis.matrix set correctly in car::linearHypothesis.
Here's some data:
library(tibble)
library(car)
library(systemfit)
set.seed(343)
N = 800
dat <-
tibble(
U = rnorm(N),
Z = rbinom(N, 1, 0.5),
Y = 0.2 * Z + U,
Y1 = Y + rnorm(N, sd = 0.3),
Y2 = Y + rnorm(N, sd = 0.5),
Y3 = Y + rnorm(N, sd = 0.5)
)
Here's the seemingly unrelated regression fit:
sur <- systemfit(list(Y1 ~ Z, Y2 ~ Z, Y3 ~ Z), method = "SUR", data = dat)
summary(sur)
Which is identical to the ols fit in this case:
ols <- lm(cbind(Y1, Y2, Y3) ~ Z, data = dat)
summary(ols)
Which is useful, because I need to estimate robust standard errors for this test:
linearHypothesis(ols, hypothesis.matrix = "Z = 0", white.adjust = "hc2")
This last line is the one that I think is incorrect. I think it's incorrect because the individual coefficients all have lower p-values than the joint test, but I could be wrong?

Looks right to me. You'd get the same result by estimating the null model (ols0 below) and using anova() to test the difference between the estimated and null models.
library(tibble)
library(car)
#> Loading required package: carData
set.seed(343)
N = 800
dat <-
tibble(
U = rnorm(N),
Z = rbinom(N, 1, 0.5),
Y = 0.2 * Z + U,
Y1 = Y + rnorm(N, sd = 0.3),
Y2 = Y + rnorm(N, sd = 0.5),
Y3 = Y + rnorm(N, sd = 0.5)
)
ols <- lm(cbind(Y1, Y2, Y3) ~ Z, data = dat)
linearHypothesis(ols, hypothesis.matrix = "Z = 0")
#>
#> Sum of squares and products for the hypothesis:
#> Y1 Y2 Y3
#> Y1 3.201796 4.693391 3.359617
#> Y2 4.693391 6.879863 4.924734
#> Y3 3.359617 4.924734 3.525216
#>
#> Sum of squares and products for error:
#> Y1 Y2 Y3
#> Y1 829.5535 756.1586 770.0808
#> Y2 756.1586 965.5959 770.4636
#> Y3 770.0808 770.4636 980.0664
#>
#> Multivariate Tests:
#> Df test stat approx F num Df den Df Pr(>F)
#> Pillai 1 0.0073689 1.96972 3 796 0.11703
#> Wilks 1 0.9926311 1.96972 3 796 0.11703
#> Hotelling-Lawley 1 0.0074236 1.96972 3 796 0.11703
#> Roy 1 0.0074236 1.96972 3 796 0.11703
ols0 <- lm(cbind(Y1, Y2, Y3) ~ 1, data = dat)
anova(ols, ols0, test="Pillai")
#> Analysis of Variance Table
#>
#> Model 1: cbind(Y1, Y2, Y3) ~ Z
#> Model 2: cbind(Y1, Y2, Y3) ~ 1
#> Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
#> 1 798 0.48198
#> 2 799 1 0.48257 0.0073689 1.9697 3 796 0.117
Created on 2022-07-08 by the reprex package (v2.0.1)

Related

How can I draw 3d hyperplane to illustrate decision boundary using ggplot?

I have dataframe df which has 3d input data : x1, x2, x3 and target t. I used logistic regression to create decision boundary
a0 + a1 * x1 + a2 * x2 + a3 * x3 = 0
I was wondering if there is a way to draw 3d hyperplane (along with 3d input data) using ggplot to illustrate decision boundary created by logistic regression.
Thanks
You cannot have a true 3D plot in ggplot2, but there are ways to represent a 3d plane using contour lines or colour fills. Here's an example using a coloured raster layer to represent a plane.
I assume from the question you want the decision boundary to be where the probability is 0.5 (i.e. the log odds = 0)
First we need a logistic regression model, so in the absence of any data in the question, let's create some that will allow us a nice example:
# Create dummy data for logistic regression
set.seed(69)
x1 <- sample(100, 1000, TRUE)
x2 <- sample(100, 1000, TRUE)
x3 <- sample(100, 1000, TRUE)
log_odds <- -1 + 0.02 * x1 + 0.005 * x2 - 0.03 * x3 + rnorm(1000, 0, 2)
odds <- exp(log_odds)
probs <- odds/(1 + odds)
y <- rbinom(1000, 1, probs)
df <- data.frame(y, x1, x2, x3)
Now we have a binary outcome, y, whose value is dependent on the values of the three independent variables x1, x2 and x3, so we can run a logistic regression and grab its coefficients:
# Run logistic regression and extract coefficients
logistic_model <- glm(y ~ x1 + x2 + x3, data = df, family = binomial)
summary(logistic_model)
#>
#> Call:
#> glm(formula = y ~ x1 + x2 + x3, family = binomial, data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.5058 -0.8689 -0.6296 1.1264 2.3669
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.888782 0.232728 -3.819 0.000134 ***
#> x1 0.012369 0.002562 4.828 1.38e-06 ***
#> x2 0.008031 0.002478 3.241 0.001191 **
#> x3 -0.020676 0.002560 -8.076 6.67e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 1235.0 on 999 degrees of freedom
#> Residual deviance: 1129.9 on 996 degrees of freedom
#> AIC: 1137.9
#>
#> Number of Fisher Scoring iterations: 4
coefs <- coef(logistic_model)
Our plot is going to show x1 on the x axis and x2 on the y axis. The colour at each point (x1, x2) will be the value of x3 that produces log odds of 0. We can get this by rearranging the formula a0 + a1 * x1 + a2 * x2 + a3 * x3 = 0 that you showed in the question:
# Create a function that returns the value of x3 at p = 0.5, given x1 and x2
find_x3 <- function(x1, x2) (-coefs[1] -coefs[2] * x1 - coefs[3] * x2)/coefs[4]
Now we can create a data frame that contains all values of x1 and x2 between 1 and 100, and find the appropriate value of x3 that gives log odds of 0 for each point on this grid:
# Create a data frame to plot the 3d plane where p = 0.5
plot_df <- expand.grid(x2 = 1:100, x1 = 1:100)
plot_df$x3 <- find_x3(plot_df$x1, plot_df$x2)
head(plot_df)
#> x2 x1 x3
#> 1 1 1 -41.99975
#> 2 2 1 -41.61133
#> 3 3 1 -41.22291
#> 4 4 1 -40.83450
#> 5 5 1 -40.44608
#> 6 6 1 -40.05766
We can confirm this gives us the values of our decision boundary by running predict with this data frame as newdata. The values should all be 0 (or very close to 0):
head(predict(logistic_model, newdata = plot_df))
#> 1 2 3 4 5
#> 0.000000e+00 0.000000e+00 -1.110223e-16 0.000000e+00 0.000000e+00
Good.
Finally, we can plot the result with a colorful divergent scale to show the values of x1, x2 and x3 that together give your decision boundary:
library(ggplot2)
ggplot(plot_df, aes(x1, x2, fill = x3)) +
geom_raster() +
scale_fill_gradientn(colours = c("deepskyblue4", "forestgreen", "gold", "red")) +
coord_equal() +
theme_classic()
If you're looking for a genuine 3d perspective plot, you could try base R's persp function:
persp(x = 1:100, y = 1:100, z = matrix(plot_df$x3, ncol = 100),
xlab = "x1", ylab = "x2", zlab = "x3",
theta = -45, , phi = 25, d = 5,
col = "gold", border = "orange",
ticktype = "detailed")
Created on 2020-08-16 by the reprex package (v0.3.0)

Covariate dependent Markov models? Plot state transition probability along gradient of covariate values

Data consists of 4 variable, id, x1 and x2, continuous variables which are correlated with y, a binary variable. 0 and 1 in the binary variable represent different states. Is it possible to use Markov chain models to calculate and plot state transition probability along the gradient of covariate values for each id and subsequently for the pooled data?
set.seed(1)
id =rep(1, 100)
x1 = rnorm(100)
x2 = rnorm(100)
z = 1 + 2*x1 + 3*x2
pr = 1/(1+exp(-z))
y = rbinom(100,1,pr)
a<-data.frame(id,x1,x2, y)
set.seed(2)
id =rep(2, 100)
x1 = rnorm(100)
x2 = rnorm(100)
z = 1 + 2*x1 + 3*x2
pr = 1/(1+exp(-z))
y = rbinom(100,1,pr)
b<-data.frame(id,x1,x2, y)
set.seed(3)
id =rep(3, 100)
x1 = rnorm(100)
x2 = rnorm(100)
z = 1 + 2*x1 + 3*x2
pr = 1/(1+exp(-z))
y = rbinom(100,1,pr)
c<-data.frame(id,x1,x2, y)
d<-rbind(a,b,c)

Multiplying results regression R

I just did a regression in R. I would like to multiply the results of each coefficient with some variables. How can I do it?
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.210e+00 7.715e-01 1.568 0.13108
SDCHRO_I -1.846e-01 2.112e-01 -0.874 0.39157
functional_cognitive_level3 4.941e-02 7.599e-02 0.650 0.52224
rev_per_members -4.955e-06 5.827e-06 -0.850 0.40432
And I want something like this:
1.210e+00 + -1.846e-01 * var1 + 4.941e-02 * var2 + 4.941e-02 * var3
Is there a way to do it?
You can use model.matrix.
data(mtcars)
lm1 <- lm(mpg~cyl+disp+hp, data=mtcars)
res <- coef(lm1) %*% t(model.matrix(lm1))
all(res==predict(lm1))
#[1] TRUE
You can access the coefficients with model$coefficients.
For example, if you want to multiply all coefficients with 10, you can do
df = data.frame(x = runif(100), y = runif(100), z = runif(100))
mod = lm(formula = y ~ x*z, data = df)
mod$coefficients
#> (Intercept) x z x:z
#> 0.6449097 -0.1989884 -0.3962655 0.4621273
mod$coefficients*10
#> (Intercept) x z x:z
#> 6.449097 -1.989884 -3.962655 4.621273
Created on 2020-07-10 by the reprex package (v0.3.0)
However, if you want to do like in your example, you need to access the invididual coefficients with model$coefficients[i], e.g.
df = data.frame(x = runif(100), y = runif(100), z = runif(100))
mod = lm(formula = y ~ x*z, data = df)
mod$coefficients[1]*10
#> (Intercept)
#> 5.994662
mod$coefficients[2]*10
#> x
#> -1.687928
Created on 2020-07-10 by the reprex package (v0.3.0)
You can even do this dynamically by looping over the length of the coefficients object.

How to plot lm slope modeled using poly()?

I need to plot the relationship between x and y where polynomials of x predict y. This is done using the poly() function in order to ensure polynomials are orthogonal.
How do I plot this relationship considering linear, quadratic and cubic terms together ? The issue is the coefficients for the different terms are not scaled as x is.
I provide some example code below. I have tried reassigning the contrast values for each polynomial to x.
This solution gives impossible predicted values.
Thank you in advance for your help !
Best wishes,
Eric
Here is an example code:
x = sample(0:6,100,replace = TRUE)
y = (x*0.2) + (x^2*.05) + (x^3*0.001)
y = y + rnorm(100)
x = poly(x,3)
m = lm(y~x)
TAB = summary(m)$coefficients
### Reassigning the corresponding contrast values to each polynomial of x:
eq = function(x,TAB,start) {
#argument 'start' is used to determine the position of the linear coefficient, quadratic and cubic follow
pols = poly(x,3)
x1=pols[,1]; x2=pols[,2]; x3=pols[,3]
TAB[1,1] + x1[x]*TAB[start,1] + x2[x] * TAB[start+1,1] + x3[x] * TAB[start+2,1]
}
plot(eq(0:7,TAB,2))
Actually, you can use poly directly in formula for lm().
y ~ poly(x, 3) in lm() might be what you want.
For plot, I'll use ggplot2 package which has geom_smooth() function. It can draw the fitted curve. You should specify
method = "lm" argument
and the formula
library(tidyverse)
x <- sample(0:6,100,replace = TRUE)
y <- (x*0.2) + (x^2*.05) + (x^3*0.001)
eps <- rnorm(100)
(df <- data_frame(y = y + eps, x = x))
#> # A tibble: 100 x 2
#> y x
#> <dbl> <int>
#> 1 3.34 4
#> 2 1.23 5
#> 3 1.38 3
#> 4 -0.115 2
#> 5 1.94 5
#> 6 3.87 6
#> 7 -0.707 3
#> 8 0.954 3
#> 9 1.19 3
#> 10 -1.34 0
#> # ... with 90 more rows
Using your simulated data set,
df %>%
ggplot() + # this should be declared at first with the data set
aes(x, y) + # aesthetic
geom_point() + # data points
geom_smooth(method = "lm", formula = y ~ poly(x, 3)) # lm fit
If you want to remove the points: erase geom_point()
df %>%
ggplot() +
aes(x, y) +
geom_smooth(method = "lm", formula = y ~ poly(x, 3))
transparency solution: control alpha less than 1
df %>%
ggplot() +
aes(x, y) +
geom_point(alpha = .3) +
geom_smooth(method = "lm", formula = y ~ poly(x, 3))

multivariate regression

I have two dependents that both depent on two variables AND on each other, can this be modelled in R (must be!) but I can't figure out how, anyone a hint?
In clear terms:
I want to model my data with the following model:
Y1=X1*coef1+X2*coef2
Y2=X1*coef2+X2*coef3
Note: coef2 appears in both lines
Xi, Yi is input and output data respectively
I got this far:
lm(Y1~X1+X2,mydata)
now how do I add the second line of the model including the cross dependency?
Your help is greatly appreciated!
Cheers, Bastiaan
Try this:
# sample data - true coefs are 2, 3, 4
set.seed(123)
n <- 35
DF <- data.frame(X1 = 1, X2 = 1:n, X3 = (1:n)^2)
DF <- transform(DF, Y1 = X1 * 2 + X2 * 3 + rnorm(n),
Y2 = X1 * 3 + X2 * 4 + rnorm(n))
# construct data frame for required model
DF2 <- with(DF, data.frame(y = c(Y1, Y2),
x1 = c(X1, 0*X1),
x2 = c(X2, X1),
x3 = c(0*X2, X2)))
lm(y ~. - 1, DF2)
We see it does, indeed, recover the true coefs of 2, 3, 4:
> lm(y ~. - 1, DF2)
Call:
lm(formula = y ~ . - 1, data = DF2)
Coefficients:
x1 x2 x3
2.084 2.997 4.007

Resources