Can't add geom_point to line plot with stat_count - r

I have a df where I have made a nice line plot using stat_count, but when I try to add geom_point it won't work.
Without the last part (geom_point(size=2)) it produces a line plot, but with it I get error:
Don't know how to automatically pick scale for object of type
function. Defaulting to continuous. Error: Column y must be a 1d
atomic vector or a list
df <- data.frame("id" = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4),
"bowl" = c("red", "red", "red","green", "green", "green",
"green", "green", "red", "red"),
"year"=c(2001:2003, 2002:2003, 2001:2003, 2001:2002))
library(dplyr)
library(ggplot2)
df %>%
ggplot(aes(x=year, y=count, colour=bowl)) +
stat_count(geom = "line",
aes(y=..count..))+
geom_point(size=2)
I suspect there's just a small adjustment to be made, but I can't seem to find it on my own.

There are two possible approaches:
Using stat_count() and specifying geom
Using geom_line() and geom_point(), resp., and specifying stat
There is a difference in the default value for position which will create different plots.
1. Stacked plot of counts (total counts)
As already mentioned by Z.Lin,
library(ggplot2)
ggplot(df, aes(x = year, y = stat(count), colour = bowl)) +
stat_count(geom = "line") +
stat_count(geom = "point")
will create a stacked line and point plot of counts, i.e., the total number of records per year (regardless of bowl):
As of version 3.0.0 of gplot2 it is possible to use the new stat() function for calculated-aesthetic variables. So, stat(count) replaces ..count...
The same plot is created by
ggplot(df, aes(x = year, y = stat(count), colour = bowl)) +
geom_line(stat = "count", position = "stack") +
geom_point(stat = "count", position = "stack")
but we have to specify explicitely that the counts have to be stacked.
2. Line and point plot of counts by colour
If we want to show the counts per year for each value of bowl separately, we can use
ggplot(df, aes(x = year, y = stat(count), colour = bowl)) +
geom_line(stat = "count") +
geom_point(stat = "count")
which produces a line and point plot for each colour.
This can also be achieved by
ggplot(df, aes(x = year, y = stat(count), colour = bowl)) +
stat_count(geom = "line", position = "identity") +
stat_count(geom = "point", position = "identity")
but know we have to specify explicitely not to stack.

Related

R drawing baseline in facet_wrap specific to each facet

Let's say I have the following dataset:
set.seed(42)
data <- data.frame(type = sample(LETTERS[1:2], 40, replace = T),
condition = sample(c("Control", "Treatment"), 40, replace = T),
measurement = runif(40))
And I'd like to create the facetted graph:
ggplot(data, aes( x= condition, y = measurement))+
geom_point()+
facet_wrap(~type)
I'd like also to show the baseline (with geom_hline, for example), that equals mean of control values (mean(data$measurement[data$condition == "Control"]). But because control values will be different in different types (meaning facets on the graph), I can't just calculate one single mean. As they will be different between the facets.
Is there any way to specify yintercept for geom_hline between different facets ?
Something like this, but with the specified yintercept value, calculating the mean values for the control group for each individual facet:
ggplot(data, aes( x= condition, y = measurement))+
geom_point()+
geom_hline(yintercept= mean(data$measurement[data$condition == "Control"]),
linetype="dashed",
color = "red", size=1)+
facet_wrap(~type)
Thanks a lot!
Best regards,
Eugene
You can use stat_summary with fun = mean and geom = "hline", passing only the control subset to the data parameter. You can map yintercept to the y value calculated by the stat.
ggplot(data, aes(x = condition, y = measurement))+
geom_point() +
stat_summary(fun = mean, geom = "hline", aes(yintercept = after_stat(y)),
data = data[data$condition == "Control",], color = "red",
linetype = "dashed") +
facet_wrap(~type)

Increase breaks in discrete legend

I'm currently getting started learning R and I'm focusing on data visualisation.
For this plot, I'm displaying the count of overlapping dots on the map using geom_count which gives me the following graph
As you can see the legend only contains two elements, namely the size of the dot when 5 data points are overlapping, and the size of it when 10 data points are overlapping. How can I increase the breaks that the legend includes? I have been trying with to use discrete_x_scale in order to increase the number of breaks but I just get lost and can't manage it.
The code for my current graph is simply this
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_count()
I would also like to know how to change the filling color of the dot according to the number of overlapping data points.
You need to modify scale_size, not scale_x:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_count() +
scale_size(breaks = c(2, 4, 6, 8))
To also change the fill colour, you can use a computed aesthetic:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = after_stat(n))) +
geom_count() +
scale_size(breaks = seq(0, 15, 3)) +
scale_color_continuous(breaks = seq(0, 15, 3)) +
guides(size = guide_legend(), color = guide_legend())
Note the guides call: without that, you’d get two separate legends for the size and colour below each other, rather than one merged legend.
To address the question of changing the fill colour as well as size try by creating an explicit count variable which is used to control size and colour:
library(dplyr)
library(ggplot2)
mpg1 <-
mpg %>%
group_by(cty, hwy) %>%
summarise(count = n())
ggplot(data = mpg1, mapping = aes(x = cty, y = hwy, colour = count, size = count))+
geom_point() +
scale_size_continuous(breaks = seq(2, 14, by = 2))+
scale_colour_continuous(breaks = seq(2, 14, by = 2))+
guides(colour = guide_legend(), size = guide_legend())
Note to ensure that only one legend title appears both the breaks for size and colour need to be identical.
Created on 2021-04-01 by the reprex package (v1.0.0)

ggplot2 custom legend with multiple geom overlays: guide_legend() confusion

I want to create a customized legend that distinguishes two plotted geoms using appropriate shape and color. I see that guide_legend() should be involved, but my legend is presented with both shapes overlayed one on the other for both components of the legend. What is the right way to build these individual legend components using distinct shapes and colors? Thank you.
library(dplyr)
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
ggplot() +
geom_point(data = df, aes(x = year, y = annualNitrogen, fill="green"), shape=24, color="green", size = 4) +
geom_point(data = df, aes(x = year, y = annualPotassium, fill="blue"), color="blue", shape=21, size = 4) +
guides(fill = guide_legend(override.aes = list(color=c("green", "blue"))),
shape = guide_legend(override.aes = list(shape=c(21, 24)))
) +
scale_fill_manual(name = 'cumulative\nmaterial',
values = c("blue"="blue" , "green"="green" ),
labels = c("potassium" , "nitrogen") ) +
theme_bw() +
theme(legend.position="bottom")
Here it helps to transform to "long" format which is more in line with how ggplot is designed to be used when separating factor levels within a single time series.
This allows us to map shape and color directly, rather than having to manually assign different values to multiple plotted series, like you do in your question.
library(tidyverse)
df %>%
pivot_longer(-year, names_to = "element") %>%
ggplot(aes(x=year, y = value, fill = element, shape = element, color = element)) +
geom_point(size = 4)+
scale_color_manual(values = c("green", "blue"))
Put your df into a long format that ggplot likes with tidyr::gather. You should only use one geom_point for this, you don't need separate geoms for separate variables. You can then specify the shape and variable in one call to geom_point.
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
df <- tidyr::gather(df, key = 'variable', value='value', annualNitrogen, annualPotassium)
ggplot(df) +
geom_point(aes(x = year, y = value, shape = variable, color = variable)) +
scale_color_manual(
name = 'cumulative\nmaterial',
values = c(
"annualPotassium" = "blue",
"annualNitrogen" = "green"),
labels = c("potassium" , "nitrogen")) +
guides(shape = FALSE)

ggplot2 - using two different color scales for same fill in overlayed plots

A very similar question to the one asked here. However, in that situation the fill parameter for the two plots are different. For my situation the fill parameter is the same for both plots, but I want different color schemes.
I would like to manually change the color in the boxplots and the scatter plots (for example making the boxes white and the points colored).
Example:
require(dplyr)
require(ggplot2)
n<-4*3*10
myvalues<- rexp((n))
days <- ntile(rexp(n),4)
doses <- ntile(rexp(n), 3)
test <- data.frame(values =myvalues,
day = factor(days, levels = unique(days)),
dose = factor(doses, levels = unique(doses)))
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot( aes(fill = dose))+
geom_point( aes(fill = dose), alpha = 0.4,
position = position_jitterdodge())
produces a plot like this:
Using 'scale_fill_manual()' overwrites the aesthetic on both the boxplot and the scatterplot.
I have found a hack by adding 'colour' to geom_point and then when I use scale_fill_manual() the scatter point colors are not changed:
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(fill = dose), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = factor(test$dose)),
position = position_jitterdodge(jitter.width = 0.1))+
scale_fill_manual(values = c('white', 'white', 'white'))
Are there more efficient ways of getting the same result?
You can use group to set the different boxplots. No need to set the fill and then overwrite it:
ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(group = interaction(day, dose)), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = dose),
position = position_jitterdodge(jitter.width = 0.1))
And you should never use data$column inside aes - just use the bare column. Using data$column will work in simple cases, but will break whenever there are stat layers or facets.

Bar plot in ggplot2 with ordered bars and manually specified colours

I want to generate a simple bar plot with ggplot2 with the bars ordered by the y-value and the colours manually defined. Here is what I tried:
df <- data.frame(c("a", "b", "c"), c(2, 3, 1))
colnames(df) <- c("shop", "revenue")
ggplot(data = df, aes(x = reorder(shop, revenue), y = revenue, fill = shop)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("blue", "yellow", "black")) +
theme_minimal()
The problem is: the colours are wrongly ordered (black, blue and yellow instead of blue, yellow and black as stated in scale_fill_manual). How to fix this?
With scale_fill_manual you assign colors to levels in your data.
At the same time, you use reorder(shop, revenue) in the definition of aes, which orders the data from left to right in ascending order. The third and last definition of the color "blue" was assigned to c which is now at the left hand side as it is the smallest.
You could time this to circumvent this:
ggplot(data = df, aes(x = reorder(shop, revenue), y = revenue, fill = shop)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("yellow", "black", "blue")) + # CHANGED
theme_minimal()
Or as #JeroenBoeye suggested:
ggplot(data = df, aes(x = reorder(shop, revenue), y = revenue, fill = shop)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("c" = "blue", "a" = "yellow", "b" = "black")) + # Jeroen Boeye's suggestion
theme_minimal()
Please let me know whether this solves your problem.

Resources