How to vary line and ribbon colours in a facet_grid - r

I'm hoping someone can help with this plotting problem I have. The data can be found here.
Basically I want to plot a line (mean) and it's associated confidence interval (lower, upper) for 4 models I have tested. I want to facet on the Cat_Auth variable for which there are 4 categories (so 4 plots). The first 'model' is actually just the mean of the sample data and I don't want a CI for this (NA values specified in the data - not sure if this is the correct thing to do).
I can get the plot some way there with:
newdata <- read.csv("data.csv", header=T)
ggplot(newdata, aes(x = Affil_Max, y = Mean)) +
geom_line(data = newdata, aes(), colour = "blue") +
geom_ribbon(data = newdata, alpha = .5, aes(ymin = Lower, ymax = Upper, group = Model, fill = Model)) +
facet_grid(.~ Cat_Auth)
But I'd like different coloured lines and shaded ribbons for each model (e.g. a red mean line and red shaded ribbon for model 2, green for model 3 etc). Also, I can't figure out why the blue line corresponding to the first set of mean values is disjointed as it is.
Would be really grateful for any assistance!

Try this:
library(dplyr)
library(ggplot2)
newdata %>%
mutate(Model = as.factor(Model)) %>%
ggplot(aes(Affil_Max, Mean)) +
geom_line(aes(color = Model, group = Model)) +
geom_ribbon(alpha = .5, aes(ymin = Lower, ymax = Upper,
group = Model, fill = Model)) +
facet_grid(. ~ Cat_Auth)

Related

How to plot two dashed regression lines using GGPlot

I am currently in the process of trying two form two dashed lines using the ggplot function. The graph is one that shows two regression lines belonging to two different factor groups. I've been able to make one of the lines dashed, but I am having trouble getting the other line to have dashes. Any help would be greatly appreciated.
coli_means %>%
ggplot(aes(time, mean_heartrate, group = treatment)) +
geom_point( aes(group = treatment, color = treatment)) +
geom_smooth(aes(method = "loess", linetype = treatment, se = FALSE,
group = treatment, color = treatment, show.legend = TRUE))
I feel I am missing one simple input. Thanks.
What you need to do is use scale_linetype_manual() and then tell it that both the treatment groups require a dashed line.
Let's start with a reproducible example:
# reproducible example:
set.seed(0)
time <- rep(1:100,2)
treatment <- c(rep("A",100), rep("B",100))
mean_heartrate <- c(rnorm(100,60,2), rnorm(100,80,2))
coli_means <- data.frame(time, treatment, mean_heartrate)
# ggplot
coli_means %>%
ggplot(aes(x = time, y = mean_heartrate)) +
geom_point(aes(color = treatment)) +
geom_smooth(aes(linetype = treatment, color = treatment))+
scale_linetype_manual(values = c('dashed','dashed'))

Add legend using geom_point and geom_smooth from different dataset

I really struggle to set the correct legend for a geom_point plot with loess regression, while there is 2 data set used
I got a data set, who is summarizing activity over a day, and then I plot on the same graph, all the activity per hours and per days recorded, plus a regression curve smoothed with a loess function, plus the mean of each hours for all the days.
To be more precise, here is an example of the first code, and the graph returned, without legend, which is exactly what I expected:
# first graph, which is given what I expected but with no legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = 20, size = 3) +
geom_smooth(method = "loess", span = 0.2, color = "red", fill = "blue")
and the graph (in grey there is all the data, per hours, per days. the red curve is the loess regression. The blue dots are the means for each hours):
When I tried to set the legend I failed to plot one with the explanation for both kind of dots (data in grey, mean in blue), and the loess curve (in red). See below some example of what I tried.
# second graph, which is given what I expected + the legend for the loess that
# I wanted but with not the dot legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = "blue", size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_identity(name = "legend model", guide = "legend",
labels = "loess regression \n with confidence interval")
I obtained the good legend for the curve only
and another trial :
# I tried to combine both date set into a single one as following but it did not
# work at all and I really do not understand how the legends works in ggplot2
# compared to the normal plots
A <- rbind(dat1, dat2)
p <- ggplot(A, aes(x = Heure, y = value, color = variable)) +
geom_point(data = subset(A, variable == "data"), size = 1) +
geom_point(data = subset(A, variable == "Moy"), size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_manual(name = "légende",
labels = c("Data", "Moy", "loess regression \n with confidence interval"),
values = c("darkgray", "royalblue", "red"))
It appears that all the legend settings are mixed together in a "weird" way, the is a grey dot covering by a grey line, and then the same in blue and in red (for the 3 labels). all got a background filled in blue:
If you need to label the mean, might need to be a bit creative, because it's not so easy to add legend manually in ggplot.
I simulate something that looks like your data below.
dat1 = data.frame(
Hour = rep(1:24,each=10),
value = c(rnorm(60,0,1),rnorm(60,2,1),rnorm(60,1,1),rnorm(60,-1,1))
)
# classify this as raw data
dat1$Data = "Raw"
# calculate mean like you did
dat2 <- dat1 %>% group_by(Hour) %>% summarise(value=mean(value))
# classify this as mean
dat2$Data = "Mean"
# combine the data frames
plotdat <- rbind(dat1,dat2)
# add a dummy variable, we'll use it later
plotdat$line = "Loess-Smooth"
We make the basic dot plot first:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)
Note with the size, we set guide to FALSE so it will not appear. Now we add the loess smooth, one way to introduce the legend is to introduce a linetype, and since there's only one group, you will have just one variable:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)+
geom_smooth(data=subset(plotdat,Data="Raw"),
aes(linetype=line),size=1,alpha=0.3,
method = "loess", span = 0.2, color = "red", fill = "blue")

Creating a ggplot() from scratch in R to illustrate results

I'm a bit new to R and this is the first time I'd like to use ggplot(). My aim is to create a few plots that will look like the template below, which is an output from the package effects for those who know it:
:
Given this data:
Average Error Area
1: 0.4407528 0.1853854 Loliondo
2: 0.2895050 0.1945540 Seronera
How can I replicate the plot seen in the image with labels, error bars as in Error and the line connecting both Average points?
I hope somebody can put me on the right track and then I will go from there for other data I have.
Any help is appreciated!
Using ggplot2::geom_errorbar you can add error bars by first deriving your ymin and ymax.
df <- tibble::tribble(~Average, ~Error, ~Area,
0.4407528, 0.1853854, "Loliondo",
0.2895050, 0.1945540, "Seronera")
dfnew <- df %>%
mutate(ymin = Average - Error,
ymax = Average + Error)
p <- ggplot(data = dfnew, aes(x = Area, y = Average)) +
geom_point(colour = "blue") + geom_line(aes(group = 1), colour = "blue") +
geom_errorbar(aes(x = Area, ymin = ymin, ymax = ymax), colour = "purple")
Here's a quick and dirty one that is similar to what was just posted:
df <-
tibble(
average = c(0.44, 0.29),
error = c(0.185, 0.195),
area = c("Loliondo", "Seronera")
)
df %>%
ggplot(aes(x = area)) +
geom_line(
aes(y = average, group = 1),
color = "blue"
) +
geom_errorbar(
aes(ymin = average - 0.5 * error, ymax = average + 0.5 * error),
color = "purple",
width = 0.1
)
The trickiest part here is the group = 1 segment, which you need for the line to be drawn with factors on the x axis.
The aes(x = area) goes up top because it's used in both geoms, while the y, group, ymin, and ymax are used only locally. The color and width arguments appear outside of the aes() call since they are used for appearance modifications.

Plotting standard error bars

I have a long format dataset with 3 variables. Im plotting two of the variables and faceting by the other one, using ggplot2. I'd like to plot the standard error bars of the observations from each facet too, but I've got no idea how. Anyone knows?
Here´s a picture of what i've got. I'd like to have the standard error bars on each facet. Thanks!!
Edit: here's some example data and the plot.
data <- data.frame(rep(c("1","2","3","4","5","6","7","8","9","10",
"11","12","13","14","15","16","17","18","19","20",
"21","22","23","24","25","26","27","28","29","30",
"31","32"), 2),
rep(c("a","b","c","d","e","f","g","h","i","j","k","l"), 32),
rnorm(n = 384))
colnames(data) <- c("estado","sector","VA")
ggplot(data, aes(x = estado, y = VA, col = sector)) +
facet_grid(.~sector) +
geom_point()
If all you want is the mean & standard error bar associated with each "estado"-"sector" combination, you can leave ggplot to do all the work, by replacing the geom_point() line with stat_summary():
ggplot(data,
aes(x = estado, y = VA, col = sector)) +
facet_grid(. ~ sector) +
stat_summary(fun.data = mean_se)
See ?mean_se from the ggplot2 package for more details on the function. The default parameter option gives you the mean as well as the range for 1 standard error above & below the mean.
If you want to show the original points, just add back the geom_point() line. (Though I think the plot would be rather cluttered for the reader, in that case...)
Maybe you could try something like below?
set.seed(1)
library(dplyr)
dat = data.frame(estado = factor(rep(1:32, 2)),
sector = rep(letters[1:12], 32),
VA = rnorm(384))
se = function(x) {
sd(x)/sqrt(length(x))
}
dat_sum = dat %>% group_by(estado, sector) %>%
summarise(mu = mean(VA), se = se(VA))
dat_plot = full_join(dat, dat_sum)
ggplot(dat_plot, aes(estado, y = VA, color = sector)) +
geom_jitter() +
geom_errorbar(aes(estado, y = mu, color = sector,
ymin = mu - se, ymax = mu + se)) +
facet_grid(.~sector)

How to combine stat_ecdf with geom_ribbon?

I am trying to draw an ECDF of some data with a "confidence interval" represented via a shaded region using ggplot2. I am having trouble combining geom_ribbon() with stat_ecdf() to achieve the effect I am after.
Consider the following example data:
set.seed(1)
dat <- data.frame(variable = rlnorm(100) + 2)
dat <- transform(dat, lower = variable - 2, upper = variable + 2)
> head(dat)
variable lower upper
1 2.534484 0.5344838 4.534484
2 3.201587 1.2015872 5.201587
3 2.433602 0.4336018 4.433602
4 6.929713 4.9297132 8.929713
5 3.390284 1.3902836 5.390284
6 2.440225 0.4402254 4.440225
I am able to produce an ECDF of variable using
library("ggplot2")
ggplot(dat, aes(x = variable)) +
geom_step(stat = "ecdf")
However I am unable to use lower and upper as the ymin and ymax aesthetics of geom_ribbon() to superimpose the confidence interval on the plot as another layer. I have tried:
ggplot(dat, aes(x = variable)) +
geom_ribbon(aes(ymin = lower, ymax = upper), stat = "ecdf") +
geom_step(stat = "ecdf")
but this raises the following error
Error: geom_ribbon requires the following missing aesthetics: ymin, ymax
Is there a way to coax geom_ribbon() into working with stat_ecdf() to produce a shaded confidence interval? Or, can anyone suggest an alternative means of adding a shaded polygon defined by lower and upper as a layer to the ECDF plot?
Try this (a bit of shot in the dark):
ggplot(dat, aes(x = variable)) +
geom_ribbon(aes(x = variable,ymin = ..y..-2,ymax = ..y..+2), stat = "ecdf",alpha=0.2) +
geom_step(stat = "ecdf")
Ok, so that's not the same thing as what you trying to do, but it should explain what's going on. The stat is returning a data frame with just the original x and the computed y, so I think that's all you have to work with. i.e. stat_ecdf only computes the cumulative distribution function for a single x at a time.
The only other thing I can think of is the obvious, calculating the lower and upper separately, something like this:
l <- ecdf(dat$lower)
u <- ecdf(dat$upper)
v <- ecdf(dat$variable)
dat$lower1 <- l(dat$variable)
dat$upper1 <- u(dat$variable)
dat$variable1 <- v(dat$variable)
ggplot(dat,aes(x = variable)) +
geom_step(aes(y = variable1)) +
geom_ribbon(aes(ymin = upper1,ymax = lower1),alpha = 0.2)
Not sure exactly how you want to reflect the CI, but ggplot_build() lets you get the generated data back from the plot, you can then overplot what you like.
This chart shows:
red = original ribbon
blue = takes the original CI vectors and applies to the ecdf curve
green = calculates the ecdf of upper and lower series and plots
g<-ggplot(dat, aes(x = variable)) +
geom_step(stat = "ecdf") +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.5, fill="red")
inside<-ggplot_build(g)
matched<-merge(inside$data[[1]],data.frame(x=dat$variable,dat$lower,dat$upper),by=("x"))
g +
geom_ribbon(data=matched, aes(x = x,
ymin = y + dat.upper-x,
ymax = y - x + dat.lower),
alpha=0.5, fill="blue") +
geom_ribbon(data=matched, aes(x = x,
ymin = ecdf(dat.lower)(x),
ymax = ecdf(dat.upper)(x)),
alpha=0.5, fill="green")

Resources