I'm currently getting started learning R and I'm focusing on data visualisation.
For this plot, I'm displaying the count of overlapping dots on the map using geom_count which gives me the following graph
As you can see the legend only contains two elements, namely the size of the dot when 5 data points are overlapping, and the size of it when 10 data points are overlapping. How can I increase the breaks that the legend includes? I have been trying with to use discrete_x_scale in order to increase the number of breaks but I just get lost and can't manage it.
The code for my current graph is simply this
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_count()
I would also like to know how to change the filling color of the dot according to the number of overlapping data points.
You need to modify scale_size, not scale_x:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_count() +
scale_size(breaks = c(2, 4, 6, 8))
To also change the fill colour, you can use a computed aesthetic:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = after_stat(n))) +
geom_count() +
scale_size(breaks = seq(0, 15, 3)) +
scale_color_continuous(breaks = seq(0, 15, 3)) +
guides(size = guide_legend(), color = guide_legend())
Note the guides call: without that, you’d get two separate legends for the size and colour below each other, rather than one merged legend.
To address the question of changing the fill colour as well as size try by creating an explicit count variable which is used to control size and colour:
library(dplyr)
library(ggplot2)
mpg1 <-
mpg %>%
group_by(cty, hwy) %>%
summarise(count = n())
ggplot(data = mpg1, mapping = aes(x = cty, y = hwy, colour = count, size = count))+
geom_point() +
scale_size_continuous(breaks = seq(2, 14, by = 2))+
scale_colour_continuous(breaks = seq(2, 14, by = 2))+
guides(colour = guide_legend(), size = guide_legend())
Note to ensure that only one legend title appears both the breaks for size and colour need to be identical.
Created on 2021-04-01 by the reprex package (v1.0.0)
Related
Long story short, I ran a bunch of stochastic simulations for each of 15 groups, and have one integer per group that I need to add to each violin in the plot, and can't seem to figure out how to do it. Here's a reproducible example:
# Making data
df <- data.frame(c(rep(1,10), rep(2,10), rep(3,10)), sample.int(100, 30), c(rep(85,10), rep(60,10), rep(55,10)))
colnames(df) <- c("Group", "Data", "Extra")
# Grouping data
df$Group <- as.factor(df$Group)
# Plotting
Violin2 <- ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
stat_summary(aes(y = Data), fun=mean, geom="point", color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point", color = "black", shape = 16, size = 3)
#geom_point(aes(y = Extra, color = "#00BB66", shape = 16, size = 3)+
Violin2
So here, I'm saying that within the df, there are three groups: 1, 2, and 3, that are applied to the "Data" column. What I need to add, are the integers from the "Extra" column of the df, as single points on each violin (so the three integers would be 85, 60, and 55).
I initially tried to add a geom_point layer, and thought Extra would be grouped by Group, just as Data was, but that didn't work (Error: Discrete value supplied to continuous scale).
I've been searching around on here a lot, and can't find a solution, so any advice would be greatly appreciated! Thanks so much in advance for any help! :)
This is the data:
And this is the plot so far:
So it's actually just one more line of code - you can stitch different geom's together in ggplot and it makes it really easy to do exactly what you're talking about. Just add
geom_point(aes(y = Data)) +
So the whole code would look like this
ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
geom_point(aes(y = Extra), size = 2, colour = "red") +
stat_summary(aes(y = Data), fun=mean, geom="point",
color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point",
color = "black", shape = 16, size = 3)
I've coloured the points red and made them bigger but you can change that. That gives:
Your example is working perfectly. The only thing to update is to not use constant value for color arg inside aes. You could use it like that only outside the aes.
# Making data
library(ggplot2)
df <- data.frame(c(rep(1,10), rep(2,10), rep(3,10)), sample.int(100, 10), c(rep(85,10), rep(60,10), rep(55,10)))
colnames(df) <- c("Group", "Data", "Extra")
# Grouping data
df$Group <- as.factor(df$Group)
# Plotting
Violin2 <- ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
stat_summary(aes(y = Data), fun=mean, geom="point", color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point", color = "black", shape = 16, size = 3) +
geom_point(aes(y = Extra))
Violin2
Created on 2021-06-08 by the reprex package (v2.0.0)
I want to separately plot data in a bubble plot like the image right (I make this in PowerPoint just to visualize).
At the moment I can only create a plot that looks like in the left where the bubble are overlapping. How can I do this in R?
b <- ggplot(df, aes(x = Year, y = Type))
b + geom_point(aes(color = Spp, size = value), alpha = 0.6) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(0.5, 12))
You can have the use of position_dodge() argument in your geom_point. If you apply it directly on your code, it will position points in an horizontal manner, so the idea is to switch your x and y variables and use coord_flip to get it in the right way:
library(ggplot2)
ggplot(df, aes(y = as.factor(Year), x = Type))+
geom_point(aes(color = Group, size = Value), alpha = 0.6, position = position_dodge(0.9)) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(1, 15)) +
coord_flip()
Does it look what you are trying to achieve ?
EDIT: Adding text in the middle of each points
To add labeling into each point, you can use geom_text and set the same position_dodge2 argument than for geom_point.
NB: I use position_dodge2 instead of position_dodge and slightly change values of width because I found position_dodge2 more adapted to this case.
library(ggplot2)
ggplot(df, aes(y = as.factor(Year), x = Type))+
geom_point(aes(color = Group, size = Value), alpha = 0.6,
position = position_dodge2(width = 1)) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(3, 15)) +
coord_flip()+
geom_text(aes(label = Value, group = Group),
position = position_dodge2(width = 1))
Reproducible example
As you did not provide a reproducible example, I made one that is maybe not fully representative of your original dataset. If my answer is not working for you, you should consider providing a reproducible example (see here: How to make a great R reproducible example)
Group <- c(LETTERS[1:3],"A",LETTERS[1:2],LETTERS[1:3])
Year <- c(rep(1918,4),rep(2018,5))
Type <- c(rep("PP",3),"QQ","PP","PP","QQ","QQ","QQ")
Value <- sample(1:50,9)
df <- data.frame(Group, Year, Value, Type)
df$Type <- factor(df$Type, levels = c("PP","QQ"))
I have a df where I have made a nice line plot using stat_count, but when I try to add geom_point it won't work.
Without the last part (geom_point(size=2)) it produces a line plot, but with it I get error:
Don't know how to automatically pick scale for object of type
function. Defaulting to continuous. Error: Column y must be a 1d
atomic vector or a list
df <- data.frame("id" = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4),
"bowl" = c("red", "red", "red","green", "green", "green",
"green", "green", "red", "red"),
"year"=c(2001:2003, 2002:2003, 2001:2003, 2001:2002))
library(dplyr)
library(ggplot2)
df %>%
ggplot(aes(x=year, y=count, colour=bowl)) +
stat_count(geom = "line",
aes(y=..count..))+
geom_point(size=2)
I suspect there's just a small adjustment to be made, but I can't seem to find it on my own.
There are two possible approaches:
Using stat_count() and specifying geom
Using geom_line() and geom_point(), resp., and specifying stat
There is a difference in the default value for position which will create different plots.
1. Stacked plot of counts (total counts)
As already mentioned by Z.Lin,
library(ggplot2)
ggplot(df, aes(x = year, y = stat(count), colour = bowl)) +
stat_count(geom = "line") +
stat_count(geom = "point")
will create a stacked line and point plot of counts, i.e., the total number of records per year (regardless of bowl):
As of version 3.0.0 of gplot2 it is possible to use the new stat() function for calculated-aesthetic variables. So, stat(count) replaces ..count...
The same plot is created by
ggplot(df, aes(x = year, y = stat(count), colour = bowl)) +
geom_line(stat = "count", position = "stack") +
geom_point(stat = "count", position = "stack")
but we have to specify explicitely that the counts have to be stacked.
2. Line and point plot of counts by colour
If we want to show the counts per year for each value of bowl separately, we can use
ggplot(df, aes(x = year, y = stat(count), colour = bowl)) +
geom_line(stat = "count") +
geom_point(stat = "count")
which produces a line and point plot for each colour.
This can also be achieved by
ggplot(df, aes(x = year, y = stat(count), colour = bowl)) +
stat_count(geom = "line", position = "identity") +
stat_count(geom = "point", position = "identity")
but know we have to specify explicitely not to stack.
A very similar question to the one asked here. However, in that situation the fill parameter for the two plots are different. For my situation the fill parameter is the same for both plots, but I want different color schemes.
I would like to manually change the color in the boxplots and the scatter plots (for example making the boxes white and the points colored).
Example:
require(dplyr)
require(ggplot2)
n<-4*3*10
myvalues<- rexp((n))
days <- ntile(rexp(n),4)
doses <- ntile(rexp(n), 3)
test <- data.frame(values =myvalues,
day = factor(days, levels = unique(days)),
dose = factor(doses, levels = unique(doses)))
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot( aes(fill = dose))+
geom_point( aes(fill = dose), alpha = 0.4,
position = position_jitterdodge())
produces a plot like this:
Using 'scale_fill_manual()' overwrites the aesthetic on both the boxplot and the scatterplot.
I have found a hack by adding 'colour' to geom_point and then when I use scale_fill_manual() the scatter point colors are not changed:
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(fill = dose), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = factor(test$dose)),
position = position_jitterdodge(jitter.width = 0.1))+
scale_fill_manual(values = c('white', 'white', 'white'))
Are there more efficient ways of getting the same result?
You can use group to set the different boxplots. No need to set the fill and then overwrite it:
ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(group = interaction(day, dose)), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = dose),
position = position_jitterdodge(jitter.width = 0.1))
And you should never use data$column inside aes - just use the bare column. Using data$column will work in simple cases, but will break whenever there are stat layers or facets.
An example using ggplot2 to graph groups of data points and lines connecting the means for each group, mapped with the same aes for shape and for linetype:
p <- ggplot(mtcars, aes(gear, mpg, shape = factor(cyl), linetype = factor(cyl))) +
geom_point(size = 2) +
stat_summary(fun.y = mean, geom = "line", size = 1) +
scale_shape_manual(values = c(1, 4, 19))
Problem is that point symbols in the legend appear a bit too small to see, relative to the line symbols:
Trying to enlarge point size in legend also enlarges lineweight, so that is not useful here.
p1 <- p + guides(shape = guide_legend(override.aes = list(size = 4)))
It would be nice if lineweight were a distinct aesthetic from size.
I tried adding
+ guides(linetype = guide_legend(override.aes = list(size = 1)))
which just gives a warning.
> Warning message:
In guide_merge.legend(init, x[[i]]) : Duplicated override.aes is ignored.
It seems to make no difference either if I move the linetype aes out of ggplot() and into stat_summary(). If I wanted only the point symbols, I could eliminate lines from the legend this way.
p2 <- p + guides(shape = guide_legend(override.aes = list(size = 4, linetype = 0)))
Instead, (keeping small point symbols in the graph itself) I want one single legend with both big point symbols as in this last image and thin line symbols as in the first image. Is there a way to do this?
It sure does seem to be difficult to set those properties independently. I was only kind of able to come up with a hack. If your real data is much different it will likely have to be adjusted. But what i did was used the override.aes to set the size of the point. Then I went in and built the plot, and then manually changed the line width settings in the actual low-level grid objects. Here's the code
pp<-ggplot(mtcars, aes(gear, mpg, shape = factor(cyl), linetype = factor(cyl))) +
geom_point(size = 3) +
stat_summary(fun.y = mean, geom = "line", size = 1) +
scale_shape_manual(values = c(1, 4, 19)) +
guides(shape=guide_legend(override.aes=list(size=5)))
build <- ggplot_build(pp)
gt <- ggplot_gtable(build)
segs <- grepl("geom_path.segments", sapply(gt$grobs[[8]][[1]][[1]]$grobs, '[[', "name"))
gt$grobs[[8]][[1]][[1]]$grobs[segs]<-lapply(gt$grobs[[8]][[1]][[1]]$grobs[segs],
function(x) {x$gp$lwd<-2; x})
grid.draw(gt)
The magic number "8" was where gt$grobs[[8]]$name=="guide-box" so i knew I was working the legend. I'm not the best with grid graphics and gtables yet, so perhaps someone might be able to suggest a more elegant way.
Using the grid function grid.force(), all the grobs in the ggplot become visible to grid's editing functions, including the legend keys. Thus, grid.gedit can be applied, and the required edit to the plot can be achieved using one line of code. In addition, I increase the width of the legend keys so that the different line types for line segments are clear.
library(ggplot2)
library(grid)
p <- ggplot(mtcars, aes(gear, mpg, shape = factor(cyl), linetype = factor(cyl))) +
geom_point(size = 2) +
stat_summary(fun.y = mean, geom = "line", size = 1) +
scale_shape_manual(values = c(1, 4, 19)) +
theme(legend.key.width = unit(1, "cm"))
p
grid.ls(grid.force()) # To get the names of all the grobs in the ggplot
# The edit - to set the size of the point in the legend to 4 mm
grid.gedit("key-[-0-9]-1-1", size = unit(4, "mm"))
To save the modified plot
g <- grid.grab()
ggsave(plot=g, file="test.pdf")
I see what you mean. Here is a solution that fits what you're looking for, I think. It keeps both of the legends separate, but places them side by side. The labels and title of the shape are left out, so that the labels to the far right correspond to both the shapes and linetypes.
I'm posting this as a separate answer because I think both methods will be valid for future readers.
p2 <- ggplot(mtcars, aes(gear, mpg, shape = factor(cyl),
linetype = factor(cyl))) +
geom_point(size = 2) +
stat_summary(fun.y = mean, geom = "line", size = 1) +
# blank labels for the shapes
scale_shape_manual(name="", values = c(1, 4, 19),
labels=rep("", length(factor(mtcars$cyl))))+
scale_linetype_discrete(name="Cylinders")+
# legends arranged horizontally
theme(legend.box = "horizontal")+
# ensure that shapes are to the left of the lines
guides(shape = guide_legend(order = 1),
linetype = guide_legend(order = 2))
p2
One way to ensure separate legends is to give them different names (or other differences that would preclude them being grouped together).
Here's an example based on the code you supplied:
p <- ggplot(mtcars, aes(gear, mpg, shape = factor(cyl), linetype = factor(cyl))) +
geom_point(size = 2) +
stat_summary(fun.y = mean, geom = "line", size = 1) +
scale_shape_manual(name="Name 1", values = c(1, 4, 19))+
scale_linetype_discrete(name="Name2")
p