Change transparency, shape and size of a categorical variables - r

I am trying to plot using ggplot and trying to set the transparency, size and shape for geom_point using a binary variable in my dataset.
For example, if binary_variable == 1 then set the size to 1, shape = triangle, transparency = 0.2, if binary_variable == 0 set the size to 0.5 etc.
I have been able to make the colour change as follows:
library(ggplot2)
df <- data.frame(variable1 = 1:5,
variable2 = 1:5,
binary = c(0,0,0,1,1))
ggplot(df, aes(x = variable1, y = variable2, colour = as.factor(binary))) +
geom_point(size = 2, alpha = 0.3) +
scale_colour_manual(values = c("grey", "black"), labels = c("cat1", "cat2")) +
theme_bw()

You can control shape, colour and aesthetics in the same way using the scale_X_manual functions. See the help page for all the different ways these can be controlled.
The key part to make this work though is to make sure that you added the variable you want to control to the aes part of the ggplot function.
Here is an example:
df$binary <- as.factor(df$binary)
ggplot(df, aes(x = variable1, y = variable2, colour = binary, shape = binary, alpha = binary)) +
geom_point(size = 2) +
scale_colour_manual(values = c("blue", "red")) +
scale_shape_manual(values=c(16,17)) +
scale_alpha_manual(values=c(1, 0.5)) +
theme_bw()

Related

geom_hline ignoring colour aes for some reason

line_data <- data.frame(value = c(1,2), color = as.factor(c("blue", "green"))
plot1 <- plot1 +
geom_hline(aes(yintercept = value, colour = color), line_data, linetype = "dashed", size = 0.5)
The above is a snippet of my code. No matter what I assign to color column, it will be ignored. I can assign integers or continuous numbers. it will be ignored.
EDITED:
The reason is because plot1 already has a scale_color_manual added to it. Now the challenge becomes how to make it work without having to remove the scale_color_manual?
Please post reproducible code to help us answer your question. We don't know what plot1 is.
Use scale_colour_identity to tell ggplot2 to interpret the variable directly as a color:
line_data <- data.frame(value = c(1,2), color = as.factor(c("blue", "green")))
ggplot(line_data) +
geom_hline(aes(yintercept = value, colour = color), line_data, linetype = "dashed", size = 0.5) +
scale_colour_identity()
EDIT: To make a new color legend compatible with an existing one, re-specify scale_colour_manual. You will still get the warning Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale., but it works:
library(ggplot2)
# example plot1 with manual scale
plot1data = data.frame(x=c(1,2,3,4),
y=c(1,2,3,4),
color=factor(c(1,1,1,2)))
plot1 = ggplot(plot1data, aes(x=x, y=y, colour=color)) +
geom_point() +
scale_colour_manual(values=c('1'='green','2'='red'))
# data you want to add to the plot
line_data <- data.frame(value = c(1,2), color = as.factor(c("blue", "green")))
# assume you have the plot1 object but no access to the code that generated it
# extract colors from plot1
ggdata = ggplot_build(plot1)$data[[1]]
plot1_colours = ggdata$colour
names(plot1_colours) = ggdata$group
# use the values originally specified for plot1 (plot_colours); add additional custom values
plot1 +
geom_hline(aes(yintercept = value, colour = color), data = line_data, linetype = "dashed", size = 0.5) +
scale_colour_manual(values=c(plot1_colours, 'blue'='blue', 'green'='green'))
Specify breaks if you want to remove some values from the legend:
plot1 +
geom_hline(aes(yintercept = value, colour = color), data = line_data, linetype = "dashed", size = 0.5) +
scale_colour_manual(values=c(plot1_colours, 'blue'='blue', 'green'='green'), breaks = names(plot1_colours))

How to avoid over lapping bubbles in bubble plot?

I want to separately plot data in a bubble plot like the image right (I make this in PowerPoint just to visualize).
At the moment I can only create a plot that looks like in the left where the bubble are overlapping. How can I do this in R?
b <- ggplot(df, aes(x = Year, y = Type))
b + geom_point(aes(color = Spp, size = value), alpha = 0.6) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(0.5, 12))
You can have the use of position_dodge() argument in your geom_point. If you apply it directly on your code, it will position points in an horizontal manner, so the idea is to switch your x and y variables and use coord_flip to get it in the right way:
library(ggplot2)
ggplot(df, aes(y = as.factor(Year), x = Type))+
geom_point(aes(color = Group, size = Value), alpha = 0.6, position = position_dodge(0.9)) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(1, 15)) +
coord_flip()
Does it look what you are trying to achieve ?
EDIT: Adding text in the middle of each points
To add labeling into each point, you can use geom_text and set the same position_dodge2 argument than for geom_point.
NB: I use position_dodge2 instead of position_dodge and slightly change values of width because I found position_dodge2 more adapted to this case.
library(ggplot2)
ggplot(df, aes(y = as.factor(Year), x = Type))+
geom_point(aes(color = Group, size = Value), alpha = 0.6,
position = position_dodge2(width = 1)) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(3, 15)) +
coord_flip()+
geom_text(aes(label = Value, group = Group),
position = position_dodge2(width = 1))
Reproducible example
As you did not provide a reproducible example, I made one that is maybe not fully representative of your original dataset. If my answer is not working for you, you should consider providing a reproducible example (see here: How to make a great R reproducible example)
Group <- c(LETTERS[1:3],"A",LETTERS[1:2],LETTERS[1:3])
Year <- c(rep(1918,4),rep(2018,5))
Type <- c(rep("PP",3),"QQ","PP","PP","QQ","QQ","QQ")
Value <- sample(1:50,9)
df <- data.frame(Group, Year, Value, Type)
df$Type <- factor(df$Type, levels = c("PP","QQ"))

How do I add a legend to identify vertical lines in ggplot?

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?
1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)
You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

ggplot2 non-linearly adjust scale with color

I have two questions:
Although this data has a range from -1 to 5, most of them have a value between -1 and 1. Thus, I was wondering if I could adjust the relationship between result and color to be non-linear (that is, more change between -1 and 1, but less change between 2 and 5). Would this be possible using GGplot2?
How can I move the scale bar into the maps, say, in the bottom right position?
Here is my code:
library(ggplot2)
library(ggmap)
library(data.table)
map<-get_map(location='united states', zoom=4, maptype = "terrain",
source='google',color='bw')
ggmap(map) + geom_point(
aes(x=longitude, y=latitude, show_guide = TRUE, colour=V1),
data=plot.data, alpha=0.3, na.rm = T) +
scale_color_gradient(low="red", high = "red4", name = "Level")
You can use scale_colour_gradientn and define values (from ?scale_colour_gradientn: if colours should not be evenly positioned along the gradient this vector gives the position (between 0 and 1) for each colour in the colours vector.), e.g.:
ggplot(data = iris, aes(x = Species, y = Sepal.Width, colour = Sepal.Length)) +
geom_point() +
scale_colour_gradientn(colours = c("blue", "red", "orange"),
values = c(0, 0.1, 1))
To change the position of the legend have a look at ?theme > legend.position and/or legend.justification
You could try this command:
+theme(legend.position = c(xxx, xxx))
Building upon beetroot's answer, one can write a short helper function to transform values on the original scale to the 0-1 scale used for plotting.
gradient_setter <- function(x, low = NULL, mid = 0, high = NULL){
rn <- range(x, na.rm = TRUE)
(c(rn[1], low, mid, high, rn[2]) - rn[1])/(rn[2] - rn[1])
}
ggplot(data = iris, aes(x = Species, y = Sepal.Width, colour = Sepal.Length)) +
geom_point() +
scale_colour_gradientn(colours = c("blue", "red", "orange"),
values = gradient_setter(iris$Sepal.Length, mid = 5, high = 5.8))

ggplot outline jitter datapoints

I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")

Resources