Add legend using geom_point and geom_smooth from different dataset - r

I really struggle to set the correct legend for a geom_point plot with loess regression, while there is 2 data set used
I got a data set, who is summarizing activity over a day, and then I plot on the same graph, all the activity per hours and per days recorded, plus a regression curve smoothed with a loess function, plus the mean of each hours for all the days.
To be more precise, here is an example of the first code, and the graph returned, without legend, which is exactly what I expected:
# first graph, which is given what I expected but with no legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = 20, size = 3) +
geom_smooth(method = "loess", span = 0.2, color = "red", fill = "blue")
and the graph (in grey there is all the data, per hours, per days. the red curve is the loess regression. The blue dots are the means for each hours):
When I tried to set the legend I failed to plot one with the explanation for both kind of dots (data in grey, mean in blue), and the loess curve (in red). See below some example of what I tried.
# second graph, which is given what I expected + the legend for the loess that
# I wanted but with not the dot legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = "blue", size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_identity(name = "legend model", guide = "legend",
labels = "loess regression \n with confidence interval")
I obtained the good legend for the curve only
and another trial :
# I tried to combine both date set into a single one as following but it did not
# work at all and I really do not understand how the legends works in ggplot2
# compared to the normal plots
A <- rbind(dat1, dat2)
p <- ggplot(A, aes(x = Heure, y = value, color = variable)) +
geom_point(data = subset(A, variable == "data"), size = 1) +
geom_point(data = subset(A, variable == "Moy"), size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_manual(name = "légende",
labels = c("Data", "Moy", "loess regression \n with confidence interval"),
values = c("darkgray", "royalblue", "red"))
It appears that all the legend settings are mixed together in a "weird" way, the is a grey dot covering by a grey line, and then the same in blue and in red (for the 3 labels). all got a background filled in blue:

If you need to label the mean, might need to be a bit creative, because it's not so easy to add legend manually in ggplot.
I simulate something that looks like your data below.
dat1 = data.frame(
Hour = rep(1:24,each=10),
value = c(rnorm(60,0,1),rnorm(60,2,1),rnorm(60,1,1),rnorm(60,-1,1))
)
# classify this as raw data
dat1$Data = "Raw"
# calculate mean like you did
dat2 <- dat1 %>% group_by(Hour) %>% summarise(value=mean(value))
# classify this as mean
dat2$Data = "Mean"
# combine the data frames
plotdat <- rbind(dat1,dat2)
# add a dummy variable, we'll use it later
plotdat$line = "Loess-Smooth"
We make the basic dot plot first:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)
Note with the size, we set guide to FALSE so it will not appear. Now we add the loess smooth, one way to introduce the legend is to introduce a linetype, and since there's only one group, you will have just one variable:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)+
geom_smooth(data=subset(plotdat,Data="Raw"),
aes(linetype=line),size=1,alpha=0.3,
method = "loess", span = 0.2, color = "red", fill = "blue")

Related

Scale density plots in ggpairs based on total datapoints?

I'm plotting correlations in ggpairs and am splitting the data based on a filter.
The density plots are normalising themselves on the number of data points in each filtered group. I would like them to normalise on the total number of data points in the entire data set. Essentially, I would like to be able to have the sum of the individual density plots be equal to the density plot of the entire dataset.
I know this probably breaks the definition of "density plot", but this is a presentation style I'd like to explore.
In plain ggplot, I can do this by adding y=..count.. to the aesthetic, but ggpairs doesn't accept x or y aesthetics.
Some sample code and plots:
set.seed(1234)
group = as.numeric(cut(runif(100),c(0,1/2,1),c(1,2)))
x = rnorm(100,group,1)
x[group == 1] = (x[group == 1])^2
y = (2 * x) + rnorm(100,0,0.1)
data = data.frame(group = as.factor(group), x = x, y = y)
#plot of everything
data %>%
ggplot(aes(x)) +
geom_density(color = "black", alpha = 0.7)
#the scaling I want
data %>%
ggplot(aes(x,y=..count.., fill=group)) +
geom_density(color = "black", alpha = 0.7)
#the scaling I get
data %>%
ggplot(aes(x, fill=group)) +
geom_density(color = "black", alpha = 0.7)
data %>% ggpairs(., columns = 2:3,
mapping = ggplot2::aes(colour=group),
lower = list(continuous = wrap("smooth", alpha = 0.5, size=1.0)),
diag = list(continuous = wrap("densityDiag", alpha=0.5 ))
)
Are there any suggestions that don't involve reformatting the entire dataset?
I am not sure I understand the question but if the densities of both groups plus the density of the entire data is to be plotted, it can easily be done by
Getting rid of the grouping aesthetics, in this case, fill.
Placing another call to geom_density but this time with inherit.aes = FALSE so that the previous aesthetics are not inherited.
And then plot the densities.
library(tidyverse)
data %>%
ggplot(aes(x, y=..count.., fill = group)) +
geom_density(color = "black", alpha = 0.7) +
geom_density(mapping = aes(x, y = ..count..),
inherit.aes = FALSE)

ggplot Loess Line Color Scale from 3rd Variable

I am trying to apply a color scale to a loess line based on a 3rd variable (Temperature). I've only been able to get the color to vary based on either the variable in the x or y axis.
set.seed(1938)
a2 <- data.frame(year = seq(0, 100, length.out = 1000),
values = cumsum(rnorm(1000)),
temperature = cumsum(rnorm(1000)))
library(ggplot2)
ggplot(a2, aes(x = year, y = values, color = values)) +
geom_line(size = 0.5) +
geom_smooth(aes(color = ..y..), size = 1.5, se = FALSE, method = 'loess') +
scale_colour_gradient2(low = "blue", mid = "yellow", high = "red",
midpoint = median(a2$values)) +
theme_bw()
This code produces the following plot, but I would like the loess line color to vary based on the temperature variable instead.
I tried using
color = loess(temperature ~ values, a2)
but I got an error of
"Error: Aesthetics must be either length 1 or the same as the data (1000): colour, x, y"
Thank you for any and all help! I appreciate it.
You can't do that when you calculate the loess with a geom_smooth since it only has access to:
..y.. which is the vector of y-values internally calculated by geom_smooth to create the regression curve"
Is it possible to apply color gradient to geom_smooth with ggplot in R?
To do this, you should calculate the loess curve manually with loess and then plot it with geom_line:
set.seed(1938)
a2 <- data.frame(year = seq(0,100,length.out=1000),
values = cumsum(rnorm(1000)),
temperature = cumsum(rnorm(1000)))
# Calculate loess curve and add values to data.frame
a2$predict <- predict(loess(values~year, data = a2))
ggplot(a2, aes(x = year, y = values)) +
geom_line(size = 0.5) +
geom_line(aes(y = predict, color = temperature), size = 2) +
scale_colour_gradient2(low = "blue", mid = "yellow" , high = "red",
midpoint=median(a2$values)) +
theme_bw()
The downside of this is that it won't fill in gaps in your data as nicely as geom_smooth

ggplot2 legend: combine discrete colors and continuous point size

There are similar posts to this, namely here and here, but they address instances where both point color and size are continuous. Is it possible to:
Combine discrete colors and continuous point size within a single legend?
Within that same legend, add a description to each point in place of the numerical break label?
Toy data
xval = as.numeric(c("2.2", "3.7","1.3"))
yval = as.numeric(c("0.3", "0.3", "0.2"))
color.group = c("blue", "red", "blue")
point.size = as.numeric(c("200", "11", "100"))
description = c("descript1", "descript2", "descript3")
df = data.frame(xval, yval, color.group, point.size, description)
ggplot(df, aes(x=xval, y=yval, size=point.size)) +
geom_point(color = df$color.group) +
scale_size_continuous(limits=c(0, 200), breaks=seq(0, 200, by=50))
Doing what you originally asked - continuous + discrete in a single legend - in general doesn't seem to be possible even conceptually. The only sensible thing would be to have two legends for size, with a different color for each legend.
Now let's consider having a single legend. Given your "In my case, each unique combination of point size + color is associated with a description.", it sounds like there are very few possible point sizes. In that case, you could use both scales as discrete. But I believe even that is not enough as you use different variables for size and color scales. A solution then would be to create a single factor variable with all possible combinations of color.group and point.size. In particular,
df <- data.frame(xval, yval, f = interaction(color.group, point.size), description)
ggplot(df, aes(x = xval, y = yval, size = f, color = f)) +
geom_point() + scale_color_discrete(labels = 1:3) +
scale_size_discrete(labels = 1:3)
Here 1:3 are those descriptions that you want, and you may also set the colors the way you like. For instance,
ggplot(df, aes(x = xval, y = yval, size = f, color = f)) +
geom_point() + scale_size_discrete(labels = 1:3) +
scale_color_manual(labels = 1:3, values = c("red", "blue", "green"))
However, we may also exploit color.group by using
ggplot(df, aes(x = xval, y = yval, size = f, color = f)) +
geom_point() + scale_size_discrete(labels = 1:3) +
scale_color_manual(labels = 1:3, values = gsub("(.*)\\..*", "\\1", sort(df$f)))

separate colours for ggplot geom_point and geom_seg

I am trying to use ggplot() to plot a betadispers object mod1 so that I can better control the colours.
I extracted the centroids from mod1 and I am using geom_point() for plotting the yearly replicates for each dune , geom_seg()to plot the lines for each dune-star, and a second geom_point() statement to plot the centroids.
When I plot this using
scale_colour_manual(values=cols, guide= FALSE)
it only changes the colour of the first geom_points and the geom_seg but not the centroids.
How do I control the colour of each component separately such that the dune points are coloured by cols, the segments are coloured by cols1 and the centroids use cols2?
I'd also like to change the colour of the black outline for each centroid to cols1.
library(vegan)
library(ggplot2)
cols = c("blue","red")
cols1 = c("green","dark orange")
cols2 = c("purple","yellow")
data(dune)
sites = data.frame(year = rep(c(1:5), times= 4), dune = rep(c(1:4),each=5), dune.type = rep(c("A","B"),each=10))
distances <- vegdist(dune, method = "bray")
#create Betadispersion model on betad (effectively a PCoA)
mod1 <- with(sites, betadisper(distances, dune, type = "centroid"))
s = scores(mod1)
# Get points
pnt_sites = as.data.frame(s$sites)
pnt_sites = cbind(pnt_sites, sites)
# Get centroids
pnt_centroids = as.data.frame(s$centroids)
pnt_centroids$dune = rownames(pnt_centroids)
pnt_centroids$dune.type = rep(c("A","B"),each=2)
# Calculate segments
seg = pnt_sites[, c("PCoA1", "PCoA2", "dune")]
tmp = rename(pnt_centroids, c("PCoA1" = "PCoA1_ctr", "PCoA2" = "PCoA2_ctr"))
seg = join(seg, tmp, "dune")
# Plot
ggplot() +
geom_point(
data = pnt_sites,
aes(x = PCoA1, y = PCoA2, colour = dune.type, shape = dune.type),
size = 2
) +
geom_segment(
data = seg,
aes(x = PCoA1, y = PCoA2, xend = PCoA1_ctr, yend = PCoA2_ctr, colour =
dune.type)
) +
geom_point(
data = pnt_centroids,
aes(x = PCoA1, y = PCoA2, fill = dune.type),
size = 3, shape = 21
) +
scale_colour_manual(values=cols, guide= FALSE) +
coord_equal() +
theme_bw()
You can only specify scale_colour_manual() once per plot, not multiple times for each geom_point call, so you need to combine your centroids and sites into one dataframe (adding variables for centroid/site, and centroid A/centroid B/site A/site B), then plot as a single geom_point() layer
#combine centroids and points into one dataframe
pnt_centroids$year = NA
pnt_centroids$data.type = "centroid"
pnt_sites$data.type = "site"
sites_centroids <- rbind(pnt_centroids, pnt_sites)
sites_centroids$type <- paste(sites_centroids$data.type, sites_centroids$dune.type)
When you define vectors of colors to use for scale_fill_manual and scale_colour_manual, each will have 6 levels, to match the number of variables you have (4 point types plus 2 segment types). Your site points and segments do not have a fill attribute, so fill will be ignored when plotting those points and segments, but you still need to define 6 colors in scale_fill_manual so that your filled points for centroids plot properly.
#change the cols vector definitions at the beginning of code to this
cols.fill <- c("purple", "yellow", "purple", "yellow", "purple", "yellow")
cols.colour <- c("green", "dark orange", "green", "dark orange", "blue", "red")
Specify the new colour, fill, and shape scales in the plot code like this:
# Plot
ggplot() +
geom_segment( #segment must go before point so points are in front of lines
data = seg,
aes(x = PCoA1, y = PCoA2, xend = PCoA1_ctr, yend = PCoA2_ctr, colour = dune.type)) +
geom_point(
data = sites_centroids,
aes(x = PCoA1, y = PCoA2, colour = type, fill = type, shape = type), size = 2) +
scale_colour_manual(values = cols.colour) +
scale_fill_manual(values = cols.fill, guide = FALSE) +
scale_shape_manual(values = c(21, 21, 16, 17)) +
coord_equal() +
theme_bw()
Here is the result. The legend gets a bit busy, it may be better to delete it and use text annotation to label the dune types.

How to add regression line in ggplot wrap

I have troubles adding linear regression lines to my ggplots.
This is how it should look like:
This is how it currently looks like:
This is my code:
p <- ggplot(data = wage, aes(x = educ, y = lwage, colour = black,
cex = IQ, pch = married, alpha = 0.7)) + geom_jitter()
p1 <- p + facet_grid(urban~experclass) + geom_smooth(se=F,method="lm")
p1 + labs(x = "Education (year)", y = "Log Wage", shape = "Marital status",
colour = "Ethnicity") + guides(alpha = FALSE)
Is the position of my geom_smooth wrong? What I want is only one black regression line for each element of the plot - and not one by layer.
Furthermore what happens when I add a regression line is that the legend symbols change. Especially the IQ legend looks pretty weird. Is there something I did not consider here?
How it should look:
How it looks:
I can try to answer at least one part of your question - which is the part about plotting one regression line instead of two per panel. I don't have your data so I can't fully replicate your problem, but I think this will work.
The aesthetics in your original ggplot() call will be inherited by all the subsequent layers, including the geom_smooth.
What you seem to want is the color aesthetic (which happens to be a grouping identifier) to apply only to the jittered points and not to the line. So you can write your code like this:
p <- ggplot(data = wage, aes(x = educ, y = lwage,
cex = IQ, pch = married, alpha = 0.7)) +
geom_jitter()
p1 <- p + facet_grid(urban~experclass) +
geom_smooth(se=F,method="lm",
aes(colour = black))
or, alternatively, as one single ggplot call in a modified style:
p3 <- ggplot(data = wage,
aes(x = educ, y = lwage,
size = IQ, shape = married, alpha = 0.7)) +
geom_jitter() +
geom_smooth(se=F,method="lm",
aes(colour = black))+
facet_grid(urban~experclass)
p3

Resources