coefplot in R with parts of independent variables - r

I would get a coefplot only with part of independent variables. My regression equation is a fixed effects regression as follows:
aa1 <-glm(Eighty_Twenty ~ Market_Share_H+Market_Share_L+Purchase_Frequency_H+Purchase_Frequency_L+factor(product_group))
coefplot(aa1)
However, I do NOT want to plot coefficients of factor(product_group) variables since there are product groups. Instead, I would get a coefplot with only the coefficients of other variables. How can I do this?

From the help pages (see ?coefplot.default) you can select what predictors or coefficients that you want in your plot.
# some example data
df <- data.frame(Eighty_Twenty = rbinom(100,1,0.5),
Market_Share_H = runif(100),
Market_Share_L = runif(100),
Purchase_Frequency_H = rpois(100, 40),
Purchase_Frequency_L = rpois(100, 40),
product_group = sample(letters[1:3], 100, TRUE))
# model
aa1 <- glm(Eighty_Twenty ~ Market_Share_H+Market_Share_L +
Purchase_Frequency_H + Purchase_Frequency_L +
factor(product_group), df, family="binomial")
library(coefplot)
# coefficient plot with the intercept
coefplot(aa1, coefficients=c("(Intercept)","Market_Share_H","Market_Share_L",
"Purchase_Frequency_H","Purchase_Frequency_L"))
# coefficient plot specifying predictors (no intercept)
coefplot(aa1, predictors=c("Market_Share_H","Market_Share_L" ,
"Purchase_Frequency_H","Purchase_Frequency_L"))

Related

Plotting quadratic curves with poisson glm with interactions in categorical/numeric variables

I want to know if it's possible to plot quadratic curves with Poisson glm with interactions in categorical/numeric variables. In my case:
##Data set artificial
set.seed(20)
d <- data.frame(
behv = c(rpois(100,10),rpois(100,100)),
mating=sort(rep(c("T1","T2"), 200)),
condition = scale(rnorm(200,5))
)
#Condition quadratic
d$condition2<-(d$condition)^2
#Binomial GLM ajusted
md<-glm(behv ~ mating + condition + condition2, data=d, family=poisson)
summary(md)
In a situation where mating, condition and condition2 are significant in the model, I make:
#Create x's vaiues
x<-d$condition##
x2<-(d$condition)^2
# T1 estimation
y1<-exp(md$coefficients[1]+md$coefficients[3]*x+md$coefficients[4]*x2)
#
# T2 estimation
y2<-exp(md$coefficients[1]+md$coefficients[2]+md$coefficients[3]*x+md$coefficients[4]*x2)
#
#
#Separete data set
d_T1<-d[d[,2]!="T2",]
d_T2<-d[d[,2]!="T1",]
#Plot
plot(d_T1$condition,d_T1$behv,main="", xlab="condition", ylab="behv",
xlim=c(-4,3), ylim=c(0,200), col= "black")
points(d_T2$condition,d_T2$behv, col="gray")
lines(x,y1,col="black")
lines(x,y2,col="grey")
#
Doesn't work and I don't have my desirable curves. I'd like a curve for T1 and other for T2 in mating variable. There are any solution for this?
In the code below, we use the poly function to generate a quadratic model without needing to create an extra column in the data frame. In addition, we create a prediction data frame to generate model predictions across the range of condition values and for each level of mating. The predict function with type="response" generates predictions on the scale of the outcome, rather than on the linear predictor scale, which is the default. Also, we change 200 to 100 in creating the data for mating in order to avoid having the exact same outcome data for each level of mating.
library(ggplot2)
# Fake data
set.seed(20)
d <- data.frame(
behv = c(rpois(100,10),rpois(100,100)),
mating=sort(rep(c("T1","T2"), 100)), # Changed from 200 to 100
condition = scale(rnorm(200,5))
)
# Model with quadratic condition
md <- glm(behv ~ mating + poly(condition, 2, raw=TRUE), data=d, family=poisson)
#summary(md)
# Get predictions at range of condition values
pred.data = data.frame(condition = rep(seq(min(d$condition), max(d$condition), length=50), 2),
mating = rep(c("T1","T2"), each=50))
pred.data$behv = predict(md, newdata=pred.data, type="response")
Now plot with ggplot2 and with base R:
ggplot(d, aes(condition, behv, colour=mating)) +
geom_point() +
geom_line(data=pred.data)
plot(NULL, xlim=range(d$condition), ylim=range(d$behv),
xlab="Condition", ylab="behv")
with(subset(d, mating=="T1"), points(condition, behv, col="red"))
with(subset(d, mating=="T2"), points(condition, behv, col="blue"))
with(subset(pred.data, mating=="T1"), lines(condition, behv, col="red"))
with(subset(pred.data, mating=="T2"), lines(condition, behv, col="blue"))
legend(-3, 70, title="Mating", legend=c("T1","T2"), pch=1, col=c("blue", "red"))

Plotting interaction effects in Bayesian models (using rstanarm)

I'm trying to show how the effect of one variables changes with the values of another variable in a Bayesian linear model in rstanarm(). I am able to fit the model and take draws from the posterior to look at the estimates for each parameter, but it's not clear how to give some sort of plot of the effects of one variable in the interaction as the other changes and the associated uncertainty (i.e. a marginal effects plot). Below is my attempt:
library(rstanarm)
# Set Seed
set.seed(1)
# Generate fake data
w1 <- rbeta(n = 50, shape1 = 2, shape2 = 1.5)
w2 <- rbeta(n = 50, shape1 = 3, shape2 = 2.5)
dat <- data.frame(y = log(w1 / (1-w1)),
x = log(w2 / (1-w2)),
z = seq(1:50))
# Fit linear regression without an intercept:
m1 <- rstanarm::stan_glm(y ~ 0 + x*z,
data = dat,
family = gaussian(),
algorithm = "sampling",
chains = 4,
seed = 123,
)
# Create data sets with low values and high values of one of the predictors
dat_lowx <- dat
dat_lowx$x <- 0
dat_highx <- dat
dat_highx$x <- 5
out_low <- rstanarm::posterior_predict(object = m1,
newdata = dat_lowx)
out_high <- rstanarm::posterior_predict(object = m1,
newdata = dat_highx)
# Calculate differences in posterior predictions
mfx <- out_high - out_low
# Somehow get the coefficients for the other predictor?
In this (linear, Gaussian, identity link, no intercept) case,
mu = beta_x * x + beta_z * z + beta_xz * x * z
= (beta_x + beta_xz * z) * x
= (beta_z + beta_xz * x) * z
So, to plot the marginal effect of x or z, you just need an appropriate range of each and the posterior distribution of the coefficients, which you can obtain via
post <- as.data.frame(m1)
Then
dmu_dx <- post[ , 1] + post[ , 3] %*% t(sort(dat$z))
dmu_dz <- post[ , 2] + post[ , 3] %*% t(sort(dat$x))
And you can then estimate a single marginal effect for each observation in your data by using something like the below, which calculated the effect of x on mu for each observation in your data and the effect of z on mu for each observation.
colnames(dmu_dx) <- round(sort(dat$x), digits = 1)
colnames(dmu_dz) <- dat$z
bayesplot::mcmc_intervals(dmu_dz)
bayesplot::mcmc_intervals(dmu_dx)
Note that the column names are simply the observations in this case.
You could also use either the ggeffects-package, especially for marginal effects; or the sjPlot-package for marginal effects and other plot types (for marginal effects, sjPlot simply wraps the functions from ggeffects).
To plot marginal effects of interactions, use sjPlot::plot_model() with type = "int". Use mdrt.values to define which values to plot for continuous moderator variables, and use ppd to let prediction be based on either the posterior distribution of the linear predictor or draws from posterior predictive distribution.
library(sjPlot)
plot_model(m1, type = "int", terms = c("x", "z"), mdrt.values = "meansd")
plot_model(m1, type = "int", terms = c("x", "z"), mdrt.values = "meansd", ppd = TRUE)
or to plot marginal effects at other specific values, use type = "pred" and specify the values in the terms-argument:
plot_model(m1, type = "pred", terms = c("x", "z [10, 20, 30, 40]"))
# same as:
library(ggeffects)
dat <- ggpredict(m1, terms = c("x", "z [10, 20, 30, 40]"))
plot(dat)
There are more options, and also different ways of customizing the plot appearance. See related help files and package vignettes.

Plotting binomial glm with interactions in numeric variables

I want to know if is possible to plotting binomial glm with interactions in numeric variables. In my case:
##Data set artificial
set.seed(20)
d <- data.frame(
mating=sample(0:1, 200, replace=T),
behv = scale(rpois(200,10)),
condition = scale(rnorm(200,5))
)
#Binomial GLM ajusted
model<-glm(mating ~ behv + condition, data=d, family=binomial)
summary(model)
In a situation where behv and condition are significant in the model
#Plotting first for behv
x<-d$behv ###Take behv values
x2<-rep(mean(d$condition),length(d_p[,1])) ##Fixed mean condition
# Points
plot(d$mating~d$behv)
#Curve
curve(exp(model$coefficients[1]+model$coefficients[2]*x+model$coefficients[3]*x2)
/(1+exp(model$coefficients[1]+model$coefficients[2]*x+model$coefficients[3]*x2)))
But doesn't work!! There is another correct approach?
Thanks
It seems like your desired output is a plot of the conditional means (or best-fit line). You can do this by computing predicted values with the predict function.
I'm going to change your example a bit, to get a nicer looking result.
d$mating <- ifelse(d$behv > 0, rbinom(200, 1, .8), rbinom(200, 1, .2))
model <- glm(mating ~ behv + condition, data = d, family = binomial)
summary(model)
Now, we make a newdata dataframe with your desired values:
newdata <- d
newdata$condition <- mean(newdata$condition)
newdata$yhat <- predict(model, newdata, type = "response")
Finally, we sort newdata by the x-axis variable (if not, we'll get lines that zig-zag all over the plot), and then plot:
newdata <- newdata[order(newdata$behv), ]
plot(newdata$mating ~ newdata$behv)
lines(x = newdata$behv, y = newdata$yhat)
Output:

R: Formula with multiple Conditions and Categorized Surface Plot

I want to make 3D plots for linear Regression Models in R: I wish to display surface of the regression plane of a linear model.
I have 2 continuous variables (say AGE, HEIGHT) and 2 factors (SEX, ALLERGIC). I want to display the predicted values of the LM w.r.t. the 2 continuous variables conditioned on the specified levels of each factor, e.g.
ILLNESS = AGE|{SEX==MALE + ALLERGIC==YES} + HEIGHT|{SEX==MALE + ALLERGIC==YES} +
AGE|{SEX==MALE + ALLERGIC==YES}*HEIGHT|{SEX==MALE + ALLERGIC==YES}
This is the outcome I have in mind:
First Question: Are there any cool function, where you can do this very easy?
Second Question: If not, how can I write formulas, where I can condition on >1 factor level?
First, let's make some sample input data to have something to test with.
set.seed(15)
dd <- data.frame(
sex = sample(c("M","F"), 200, replace=T),
allergic = sample(c("YES","NO"), 200, replace=T),
age = runif(200, 18,65),
height = rnorm(200, 6, 2)
)
expit <- function(x) exp(x)/(exp(x)+1)
dd <- transform(dd,
illness=expit(-1+(sex=="M")*.8-0.025*age*ifelse(sex=="M",-1,1)+.16*height*ifelse(allergic=="YES",-1,1)+rnorm(200))>.5
)
Now we define the set of values we want to predict over
gg<-expand.grid(sex=c("M","F"), allergic=c("YES","NO"))
vv<-expand.grid(age=18:65, height=3:9)
and then we fit a model, and use the predict function to calculate the response for each point on the surface we wish to plot.
mm <- glm(illness~sex+allergic+age+height, dd, family=binomial)
pd<-do.call(rbind, Map(function(sex, allergic) {
nd <- cbind(vv, sex=sex, allergic=allergic)
cbind(nd, pred=predict(mm, nd, type="response"))
}, sex=gg$sex, allergic=gg$allergic))
Finally, we can use lattice to plot the data
library(lattice)
wireframe(pred~age+height|sex+allergic, pd, drape=TRUE)
which give us

project a linear regression hyper plane to a 2d plot (abline-like)

I have this code
factors<-read.csv("India_Factors.csv",header=TRUE)
marketfactor<-factors[,4]
sizefactor<-factors[,5]
valuefactor<-factors[,6]
dati<-get.hist.quote("SI", quote = "AdjClose", compression = "m")
returns<-diff(dati)
regression<-lm(returns ~ marketfactor + sizefactor + valuefactor,na.action=na.omit)
that does multilinear regression.
I want to plot on a 2D plane the returns against a factor (and this is trivial of course) with superimposed the projection of the linear regression hyperplane for the specific factor. To be more clear the result should be like this: wolfram demonstrations (see the snapshots).
Any help will be greatly appreciated.
Thank you for your time and have a nice week end.
Giorgio.
The points in my comment withstanding, here is the canonical way to generate output from a fitted model in R for combinations of predictors. It really isn't clear what the plots you want are showing, but the ones that make sense to me are partial plots; where one variable is varied over its range whilst holding the others at some common value. Here I use the sample mean when holding a variable constant.
First some dummy data, with only to covariates, but this extends to any number
set.seed(1)
dat <- data.frame(y = rnorm(100))
dat <- transform(dat,
x1 = 0.2 + (0.4 * y) + rnorm(100),
x2 = 2.4 + (2.3 * y) + rnorm(100))
Fit the regression model
mod <- lm(y ~ x1 + x2, data = dat)
Next some data values to predict at using the model. You could do all variables in a single prediction and then subset the resulting object to plot only the relevant rows. Alternatively, more clearly (though more verbose), you can deal with each variable separately. Below I create two data frames, one per covariate in the model. In a data frame I generate 100 values over the range of the covariate being varied, and repeat the mean value of the other covariate(s).
pdatx1 <- with(dat, data.frame(x1 = seq(min(x1), max(x1), length = 100),
x2 = rep(mean(x2), 100)))
pdatx2 <- with(dat, data.frame(x1 = rep(mean(x1), 100),
x2 = seq(min(x2), max(x2), length = 100)))
In the linear regression with straight lines, you really don't need 100 values --- the two end points of the range of the covariate will do. However for models where the fitted function is not linear you need to predict at more locations.
Next, use the model to predict at these data points
pdatx1 <- transform(pdatx1, yhat = predict(mod, pdatx1))
pdatx2 <- transform(pdatx2, yhat = predict(mod, pdatx2))
Now we are ready to draw the partial plots. First compute a range for the y axis - again it is mostly redundant here but if you are adding confidence intervals you will need to include their values below,
ylim <- range(pdatx1$y, pdatx2$y, dat$y)
To plot (here putting two figures on the same plot device) we can use the following code
layout(matrix(1:2, ncol = 2))
plot(y ~ x1, data = dat)
lines(yhat ~ x1, data = pdatx1, col = "red", lwd = 2)
plot(y ~ x2, data = dat)
lines(yhat ~ x2, data = pdatx2, col = "red", lwd = 2)
layout(1)
Which produces

Resources