I am using the cat_plot function from the 'interactions' package in R (which is a wrapper for ggplot) to plot a 2-way interaction with 2 categorical variables. I can do this easily using the code below (reprex from the "diamonds" dataset)
require(interactions)
data("diamonds")
m <- glm(price ~ cut*color, data = diamonds)
cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1")
This produces the following graph
However, what I would like is to have a faceted graph with each of the cuts presented separately, to make it visually easier to interpret. This can be done for 3-way interactions using the facet.modx = TRUE command, but when I try this with only a 2-way interaction with cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1", facet.modx = TRUE) I get the following error
Error in prep_data(model = model, pred = pred, modx = modx, pred.values = pred.values, :
formal argument "facet.modx" matched by multiple actual arguments
Is there a way to easily facet the graph for 2 way interactions? My real-life dataset is actually a glmer model so I would prefer to stay within the "interactions" package if possible.
EDIT: based on the suggestion from #stefan I tried the following syntax cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1") + facet_wrap(~cut) which produced the graph below. This is almost exactly what I want, except it has seemed to keep the other 'cuts' on the x-axis and just removed the bars. Ideally, colours would be on the x-axis instead.
EDIT 2:
I have recreated the problem using data which is more similar to what I am actually working with, with a binary outcome, random effects from glmer etc.
require(lme4)
require(interactions)
set.seed(123)
id <- rep(1:150, each = 4)
condition <- rep(c("a", "b", "c"), each = 4, times = 50)
cat_mod <- rep(c("cat_1", "cat_2", "cat_3", "cat_4"), each = 1, length.out = 600)
control_mod <- rep(c("control_1", "control_2"), each = 4, length.out = 600)
binary_choice <- rbinom(600, 1, 0.5)
simdat <- data.frame(id, condition, cat_mod, binary_choice, control_mod)
m <- glmer(binary_choice ~ condition*cat_mod + control_mod + (1 | id), family=binomial, data = simdat)
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar", colors = "Set1")
I would like to preserve the response scale on the y-axis, and the model accounting for the random intercept, which is why I was trying to avoid using ggplot directly, as the interactions package is already built to accommodate glmms, which is super convenient.
SOLVED
Following the suggestion from #RStam I modified the code slightly so that all y-axes had the same scale, and removed the duplicate facet labels at the bottom.
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar", colors = "Set1") +
scale_x_discrete(labels = c(a = " ", b = " ", c = " ")) +
facet_wrap(condition~., scales= "free_x")
This was the final result
Original Answer
cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1") +
facet_wrap(~cut, scales = "free_x")
Edit 1
After that it still wasn't resolving your issue I've updated my answer. This should resolve the issue you are having.
library(tidyverse)
ggplot(diamonds, aes(x=color,y=price, fill = color)) +
geom_col() + facet_wrap(~cut, scales = "free")
Edit 2
Using your new data and the interactions package I found a rather unpleasant 'hack' using scale_x_discrete() but it should give the desired outcome.
library(interactions)
library(lme4)
set.seed(123)
id <- rep(1:150, each = 4)
condition <- rep(c("a", "b", "c"), each = 4, times = 50)
cat_mod <- rep(c("cat_1", "cat_2", "cat_3", "cat_4"), each =
1, length.out = 600)
control_mod <- rep(c("control_1", "control_2"), each = 4,
length.out = 600)
binary_choice <- rbinom(600, 1, 0.5)
simdat <- data.frame(id, condition, cat_mod, binary_choice,
control_mod)
m <- glmer(binary_choice ~ condition*cat_mod + control_mod +
(1 | id), family=binomial, data = simdat)
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar",
colors = "Set1") + scale_x_discrete() +
facet_wrap(condition~., scales= "free")
Related
Reproduced from this code:
library(haven)
library(survey)
library(dplyr)
nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))
# Rename variables into something more readable
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA
nhanesAnalysis <- nhanesDemo %>%
mutate(LowIncome = case_when(
INDFMIN2 < 40 ~ T,
T ~ F
)) %>%
# Select the necessary columns
select(INDFMIN2, LowIncome, persWeight, psu, strata)
# Set up the design
nhanesDesign <- svydesign(id = ~psu,
strata = ~strata,
weights = ~persWeight,
nest = TRUE,
data = nhanesAnalysis)
svyhist(~log10(INDFMIN2), design=nhanesDesign, main = '')
How do I color the histogram by independent variable, say, LowIncome? I want to have two separate histograms, one for each value of LowIncome. Unfortunately I picked a bad example, but I want them to be see-through in case their values overlap.
If you want to plot a histogram from your model, you can get its data from model.frame (this is what svyhist does under the hood). To get the histogram filled by group, you could use this data frame inside ggplot:
library(ggplot2)
ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
geom_histogram(alpha = 0.5, color = "gray60", breaks = 0:20 / 10) +
theme_classic()
Edit
As Thomas Lumley points out, this does not incorporate sampling weights, so if you wanted this you could do:
ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
geom_histogram(aes(weight = persWeight), alpha = 0.5,
color = "gray60", breaks = 0:20 / 10) +
theme_classic()
To demonstrate this approach works, we can replicate Thomas's approach in ggplot using the data example from svyhist. To get the uneven bin sizes (if this is desired), we need two histogram layers, though I'm guessing this would not be required for most use-cases.
ggplot(model.frame(dstrat), aes(enroll)) +
geom_histogram(aes(fill = "E", weight = pw, y = after_stat(density)),
data = subset(model.frame(dstrat), stype == "E"),
breaks = 0:35 * 100,
position = "identity", col = "gray50") +
geom_histogram(aes(fill = "Not E", weight = pw, y = after_stat(density)),
data = subset(model.frame(dstrat), stype != "E"),
position = "identity", col = "gray50",
breaks = 0:7 * 500) +
scale_fill_manual(NULL, values = c("#00880020", "#88000020")) +
theme_classic()
You can't just extract the data and use ggplot, because that won't use the weights and so misses the whole point of svyhist. You can use the add=TRUE argument, though. You do need to set the x and y axis ranges correctly to make sure the whole plot is visible
Using the data example from ?svyhist
svyhist(~enroll, subset(dstrat,stype=="E"), col="#00880020",ylim=c(0,0.003),xlim=c(0,3500))
svyhist(~enroll, subset(dstrat,stype!="E"), col="#88000020",add=TRUE)
I am trying to plot my date in ggplot like this: , with line type been determined by the p values of the smooth lines (i.e., dash line if the regression is not significant, and solid line when it is). Before I post this question, I tried this answer in this forum, but they normally deal with labels, not the line itself.
Belwo is my failure code with sample data. Thanks in advance for your kind help.
library(plyr)
library(ggplot2)
dat <- data.frame(id = 1: 100,
x = rnorm(100,2,0.5),
y = rnorm(100, 20, 5),
varA = rep(letters[1:4], 25),
varB = factor(sample(c(50,100,150), 100, TRUE)))
pvdat <- ddply(dat,.(varA,varB), function(df) data.frame(pvalue=format(signif(summary(lm(y~x,data=df))[[4]][2, 4], 2),scientific=-2),
lty = ifelse(summary(lm(y~x,data=df))[[4]][2, 4] > 0.05, 0, 1)))
ggplot(data= dat, aes(x = x, y = y, col = as.factor(varB))) + geom_smooth(method = "lm", aes(linetype = pvdat$lty)) + facet_grid(. ~ as.factor(varA), scale = "free_x")
There are two problems here:
pvdata$lty is continuous, but linetype requires a factor
pvdata has ten items but dat has 100, so ggplot does not know how to make a mapping between the two
To change your numeric column to a factor, you need as.factor(), and to make the mapping you can use the merge() function to make a single data frame with the values from pvdat mapped for each element of dat. Putting these together:
ggplot(data= merge(dat,pvdat,by = c("varA","varB")), aes(x = x, y = y, col = as.factor(varB))) + geom_smooth(method = "lm", aes(linetype = as.factor(lty))) + facet_grid(. ~ as.factor(varA), scale = "free_x")
will solve your problem.
I have recently came across a problem with ggplot2::geom_density that I am not able to solve. I am trying to visualise a density of some variable and compare it to a constant. To plot the density, I am using the ggplot2::geom_density. The variable for which I am plotting the density, however, happens to be a constant (this time):
df <- data.frame(matrix(1,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(5,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
This is OK and something I would expect. But, when I shift this distribution to the far right, I get a plot like this:
df <- data.frame(matrix(71,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(75,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
which probably means that the kernel estimation is still taking 0 as the centre of the distribution (right?).
Is there any way to circumvent this? I would like to see a plot like the one above, only the centre of the kerner density would be in 71 and the vline in 75.
Thanks
Well I am not sure what the code does, but I suspect the geom_density primitive was not designed for a case where the values are all the same, and it is making some assumptions about the distribution that are not what you expect. Here is some code and a plot that sheds some light:
# Generate 10 data sets with 100 constant values from 0 to 90
# and then merge them into a single dataframe
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100),facet=v)
}
df <- do.call(rbind,dfs)
# facet plot them
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
So it is not doing what you thought it was, but it is also probably not doing what you want. You could of course make it "translation-invariant" (almost) by adding some noise like this for example:
set.seed(1234)
noise <- +rnorm(100,0,1e-3)
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100)+noise,facet=v)
}
df <- do.call(rbind,dfs)
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
Note that there is apparently a random component to the geom_density function, and I can't see how to set the seed before each instance, so the estimated density is a bit different each time.
I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines
I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)
Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")
Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.