ggplot2 - Assign symbol fill based on factors

ggplot2 - Assign symbol fill based on factors - r

I am relatively new to R and new to ggplot. I have a large data set and would like to make plots where the symbol fill for any given data series on the plot is governed by a factor in the data set. An example data frame is shown below where both "Station" and "Flag" are factors
Day Station Value Flag
1 1 0.0 b
2 1 1.0 a
3 1 2.0 a
1 2 2.3 a
2 2 1.0 a
3 2 0.2 b
1 3 0.5 b
2 3 0.5 b
3 3 0.5 b
I can control symbol shape and color using the factor "Station" but also want to control symbol fill using the factor "Flag" where, for example, filled symbols are used for Flag = "a" and open symbols are used for Flag = "b" - any given station will have a mix of filled and open symbols. I can do this in the base plotting functions of R but can't get it to work in ggplot2. This would be very tedious to do for the real data set with many more stations or other factors of interest.
As I am new, the system won't let me post images (not enough reputation points) but here is the code I use to generate the figure I am looking for and the failed ggplot attempt - not shown are the numerous variations on "scale_fill_manual" that have not worked.
library(ggplot2)
data <- read.table("dfquestion.csv", header = TRUE, sep = ",", na.strings = "",
colClasses = c("Day" = "numeric",
"Station" = "factor",
"Value" = "numeric",
"Flag" = "factor"
)
)
##---------------------------------------------------
# Select parameter to graph
x.value <- "Day"
y.value <- "Value"
##---------------------------------------------------
# Subset data by Station
##---------------------------------------------------
Sta1 <- subset(data, Station == "1")
Sta2 <- subset(data, Station == "2")
Sta3 <- subset(data, Station == "3")
##---------------------------------------------------
#
# Set symbol colors and background.
# open symbols/ clear background = value below reporting limit
# Black = Station 1
# Red = Station 2
# Green = Station 3
#
##---------------------------------------------------
bg.list1 <- rep(0,length(Sta1$Flag))
bg.list1[Sta1$Flag == "a"] <- "black"
bg.list1[Sta1$Flag == "b"] <- NA
#
bg.list2 <- rep(0,length(Sta2$Flag))
bg.list2[Sta2$Flag == "a"] <- "red"
bg.list2[Sta2$Flag == "b"] <- NA
#
bg.list3 <- rep(0,length(Sta3$Flag))
bg.list3[Sta3$Flag == "a"] <- "green"
bg.list3[Sta3$Flag == "b"] <- NA
##---------------------------------------------------
#
# Symbol type
# circle = Sta1; pch = 21
# square = Sta2; pch = 22
# triangle = Sta3; pch = 24
#
##---------------------------------------------------
opar <- par(no.readonly=TRUE)
par(oma = c(0,1,0,2.5))
plot(Sta1$Day, Sta1$Value, type = "b", pch = 24, bg = c(bg.list1), cex = 1.2, col = "black",
xlim = c(min(Sta1$Day), max(Sta1$Day)),
ylim = c(range(na.omit(Sta1$Value),
na.omit(Sta2$Value),
na.omit(Sta3$Value)
)),
xlab = x.value,
ylab = y.value,
cex.lab = 1.25, cex.axis = 1.25,
)
points(Sta2$Day, Sta2$Value, type = "b", pch = 21, bg = c(bg.list2), cex = 1.2, col = "red")
points(Sta3$Day, Sta3$Value, type = "b", pch = 22, bg = c(bg.list3), cex = 1.2, col = "green")
###
##---------------------------------------------------
# Creates legend outside primary graph
par(fig = c(0, 1, 0, 1), oma = c(0,0,0,0), mar = c(0,0,0,0), new = TRUE)
plot(0,0, type = "n", xaxt = "n", yaxt = "n")
legend("topright", legend = c(paste("Sta1"),
paste("Sta2"),
paste("Sta3"),
paste(""),
paste("above"),
paste("below")
),
pch = c(21, 22, 24,
NA,
21, 21),
lty = c(NA), lwd = c(NA),
col = c("black", "red", "green",
NA,
"red", "red"),
pt.bg = c("black", "red", "green",
NA,
"red", NA
),
text.col = "black",
bty = "n", cex = 0.95,
inset=c(-0.01,0.14))
#
par(opar)
#
##---------------------------------------------------
# Attempt to do the same thing in ggplot
#
p <- ggplot(data = data, aes(x=Day, y=Value, shape = Station, color = Station, fill = Flag)) +
geom_line(size = 1) +
geom_point(size = 4)
p <- p + scale_shape_manual(values = c(21, 22, 23))
p <- p + scale_color_manual(values = c("black", "red", "green"))
# Adjust legend fills - filled = detect, open = non-detect
p <- p + guides(fill = guide_legend(override.aes = list(shape = 21, fill = c("black", NA))))
print(p)

I think that alpha might be of use here. I came up with a solution, but I'm not entirely happy with it. I thought alpha would just impact fill for points that need both color and fill but it changed the transparency of both. I ended up adding a second geom_point layer as a work around.
ggplot(data = dat, aes(x=Day, y=Value, shape = Station, fill = Station, color = Station)) +
geom_line(size = 1, show_guide = FALSE) +
geom_point(size = 4, aes(fill = NULL)) +
geom_point(size = 4, aes(alpha = Flag)) +
scale_shape_manual(values = c(21, 22, 23)) +
scale_color_manual(values = c("black", "red", "green")) +
scale_fill_manual(values = c("black", "red", "green")) +
scale_alpha_manual(values = c(1, 0)) +
guides(alpha = guide_legend(override.aes = list(shape = 21, fill = c("black", NA), alpha = c(1,1))))

Related

how to merge two legends with shapes in ggplot2?

If i plot this data here:
data <- data.frame(Xdata = rnorm(6),
Ydata = rnorm(6),
Group1 = c("ld-01", "ld-02", "ld-03",
"ld-04", "ld-05", "ld-06"),
Group2 = c("ld", "ld", "l",
"ld4", "l", "ld6"))
ggplot(data, aes(Xdata, Ydata, color =
Group2, shape = Group1)) +
geom_point(size = 7)
I want to replace the upper legend by the one "this on"

One option would be to map Group1 on the color aes too and use scale_color_manual to assign your desired colors like so:
set.seed(123)
data <- data.frame(
Xdata = rnorm(6),
Ydata = rnorm(6),
Group1 = c(
"ld-01", "ld-02", "ld-03",
"ld-04", "ld-05", "ld-06"
),
Group2 = c(
"ld", "ld", "l",
"ld4", "l", "ld6"
)
)
library(ggplot2)
colors <- scales::hue_pal()(4)
pal_color <- colors[c(2, 2, 1, 3, 1, 4)]
ggplot(data, aes(Xdata, Ydata,
color =
Group1, shape = Group1
)) +
geom_point(size = 7) +
scale_color_manual(values = pal_color)
EDIT if you want to keep both legends and only color the shape legend one way to go is to use the override.aes argument of guide_legend which allows to set the colors like so:
library(ggplot2)
#colors <- scales::hue_pal()(4)
colors <- c("black", "blue", "red", "green")
pal_color <- colors[c(2, 2, 1, 3, 1, 4)]
ggplot(data, aes(Xdata, Ydata,
color = Group2, shape = Group1
)) +
geom_point(size = 7) +
scale_color_manual(values = colors) +
guides(shape = guide_legend(override.aes = list(color = pal_color)))

How to represent different borders in tmap legend

my map looks similar to this one:
# import shapefile
shape_data <- system.file("shape/nc.shp", package="sf")
shape_data <- st_read(shape_data)
sample_data <- filter(shape_data, CNTY_ID >2100)
shape_area <- st_union(shape_data)
#map data
tm_shape(shape_data) + tm_borders(alpha = 0.4,lwd = 0.1, col = "blue") +
tm_fill(col = "BIR79", style = "equal") +
tm_shape(sample_data) + tm_borders(col = "red", lwd = 3, lty = 4) +
tm_shape(shape_area) + tm_borders(col = "black", lwd = 2)
the polygons in the map are distinguished by three types of boundaries, with different colours, size and, in one case, by dashes.
Now, I would like to find a way to represent these 3 boundaries in a legend, either in addition to the data legend or even separate. So I would like to have the representation of the type of boundaries (red dashed, black and blue with their different size, and next to them add a text explaining what they represent, i.e. districts, boundaries of the study area and so on.
Do you have any idea how to do this with tmap? Thanks

Here's a solution using the viewport function from the grid package:
library(tmap)
library(sf)
library(tidyverse)
library(grid)
shape_data <- system.file("shape/nc.shp", package="sf")
shape_data <- st_read(shape_data)
sample_data <- filter(shape_data, CNTY_ID >2100)
shape_area <- st_union(shape_data)
tm1 <- tm_shape(shape_data) +
tm_fill(col = "BIR79", style = "equal") +
tm_shape(shape_data) + tm_borders(alpha = 0.4,lwd = 1.5, col = "blue") +
tm_shape(sample_data) + tm_borders(col = "red", lwd = 3, lty = 4) +
tm_shape(shape_area) + tm_borders(col = "black", lwd = 2)
# add second legend
tm_leg = tm_shape(sample_data) + tm_fill(alpha = 0) + # dummy layer
tm_add_legend("line",
col = c("blue", "red", "black"),
lwd = c(1.5, 3, 2),
lty = c(1, 4, 1),
labels = c("var1", "var2", "var3"),
title = "type") +
tm_layout(legend.position = c(.15 ,-.2),
bg.color = NA,
frame = F,
legend.width = 10,
legend.height = 10)
tmap_save(tm1, insets_tm = tm_leg, insets_vp = viewport(), "test_map.png")

Center stat_summary mean for different number of factor levels with position_dodge

I have an issue with the correct dodging of stat_summary together with position_dodge. I will use the diamond data to illustrate. The goal is to visualize how a variable used for prediction (here we use price) is distributed among the classes I want to predict (carat > 1 and carat < 1)
Therefore, I binned the continuous carat values into two classes carat > 1 and carat < 1.
The average price is calculated for both classes. In an additional step, carat
is binned again within the ggplot2 call for the fill asthetics. The first fill-bin 0-1 goes together with the class carat < 1. In my actual data, the first class would be "absent" and the second
class is "present". Therefore, the other fill-bins belong to class carat > 1 and allow for more detailed display of distribution.
My question: is it possible to center the mean geom_point for each class, regardless of the number of fill-bins? In this case, the blue-coloured dot should be in the center of the blue circled column and the red-coloured dot should be centered for the present amount of fill-bins (2 or 3).
If been playing around with values for stat_summary(position=position_dodge(width = 0.9)
but I just don't get there completely.
library(ggplot2)
data("diamonds")
set.seed(10)
diamond_subset <- diamonds[sample(nrow(diamonds), 1500),]
diamond_subset$carat[diamond_subset$carat < 1] <- 0
# plot price of diamond by class label and carat intervals
# split carat values into 2 classes for classification
diamond_subset$classes <- cut(diamond_subset[["carat"]],
breaks = c(-Inf, 1, Inf), labels = c("carat_below_1", "carat_above_1"))
# set intervals for continuous carat values, all intervals > 1 belong to class "carat_above_1"
break_points <- c(0,1,2,3,4,5.1)
ggplot(diamond_subset, aes(x = cut, y = price, colour = classes)) +
geom_point(aes(fill = cut(carat, break_points, include.lowest = TRUE)), pch = 21, alpha = 0.4, size = 2,
position = position_jitterdodge(dodge.width = 0.7, jitter.width = 0.5)) +
stat_summary(position=position_dodge(width = 0.9), fill = "black",
fun.y = mean, geom = "point", shape = 21, size = 4, stroke = 1.5, alpha = 1) +
scale_fill_manual(values = c("white", "green", "yellow", "red", "black"),
name = "Carat") +
scale_colour_manual(
values = c("blue", "red"),
name = "Average price per class",
breaks = c("carat_below_1", "carat_above_1"),
labels = c("Carat < 1", "Carat > 1")
)
Extra question: Is a possible solution also applicable for more than 2 classes? E.g.
diamond_subset$classes <- cut(diamond_subset[["carat"]],
breaks = c(-Inf, 1, 2, Inf), labels = c("carat_below_1", "carat_above_1", "carat_above_2"))
ggplot(diamond_subset, aes(x = cut, y = price, colour = classes)) +
geom_point(aes(fill = cut(carat, break_points, include.lowest = TRUE)), pch = 21, alpha = 0.4, size = 2,
position = position_jitterdodge(dodge.width = 0.7, jitter.width = 0.5)) +
stat_summary(position=position_dodge(width = 0.9), fill = "black",
fun.y = mean, geom = "point", shape = 21, size = 4, stroke = 1.5, alpha = 1) +
scale_fill_manual(values = c("white", "green", "yellow", "red", "black"),
name = "Carat") +
scale_colour_manual(
values = c("blue", "red", "black"),
name = "Average price per class",
breaks = c("carat_below_1", "carat_above_1", "carat_above_2"),
labels = c("Carat < 1", "Carat > 1", "Carat > 2")
)
EDIT:
I can save the plots as pdf and center the dots manually of course, if no one has a solution to this?

Why does my ggplot have the wrong y-axis scale?

I'm using a concatenated version of frankbi's Price of weed data (https://github.com/frankbi/price-of-weed for the original, https://github.com/Travis-Barton/Github_code for the cleaned version). I turn the histograms of HighQ/MedQ/Lowq into density lines. Then I try to plot those density lines on top of one another in a ggplot environment, but my axes are coming out all wrong.
This is what I should have:
This is what I end up getting:
Why is ggplot scaling my x-axis? Below is my code
dat <- marijuana.street.price.clean
hist(dat$HighQ)
hist(dat$MedQ)
hist(dat$LowQ)
plot(1, ylim = c(0, .02), xlim = c(0, 800))
lines(density(dat$HighQ), lwd = 3, col = 'green', lty = 2)
lines(density(dat$MedQ), lwd = 3, col = 'yellow', lty = 2)
lines(density(dat$LowQ, na.rm = T), lwd = 3, col = 'red', lty = 2)
dat2 <- data.frame(indexH = density(dat$LowQ, na.rm = T)$x,
propH = density(dat$HighQ)$y,
indexM = density(dat$MedQ)$x,
propM = density(dat$MedQ)$y,
indexL = density(dat$LowQ, na.rm = T)$x,
propL = density(dat$LowQ, na.rm = T)$y
)
ggplot(dat2) +
geom_line(aes(y = propH, x = indexH, colour = "High")) +
geom_line(aes(y = propL, x= indexL, colour = "Low")) +
geom_line(aes(y = propM, x = indexM, colour = "Medium"))

Plotting two densities with vertical lines and correct legend

I want to draw two densities with two vertical lines for the averages.
The legend is once to denote the densities and once the vertical
lines.
I tried the code below. However, only one legend appears and the labeling is wrong.
Can anyone help me?
set.seed(1234)
data <- data.frame(value = rnorm(n = 10000, mean = 50, sd = 20),
type = sample(letters[1:2], size = 10000, replace = TRUE))
data$value[data$type == "b"] <- data$value[data$type == "b"] + 50
mean.a <- mean(data$value[data$type == "a"])
mean.b <- mean(data$value[data$type == "b"])
library(ggplot2)
gp <- ggplot(data = data, aes(x = value))
gp <- gp + geom_density(aes(fill = type), color = "black", alpha=0.3, lwd = 1.0, show.legend = TRUE)
gp <- gp + scale_fill_manual(breaks = 1:2, name = "Density", values = c("a" = "green", "b" = "blue"), labels = c("a" = "Density a", "b" = "Density b") )
gp <- gp + geom_vline(aes(color="mean.a", xintercept=mean.a), linetype="solid", size=1.0, show.legend = NA)
gp <- gp + geom_vline(aes(color="mean.b", xintercept=mean.b), linetype="dashed", size=1.0, show.legend = NA)
gp <- gp + scale_color_manual(name = "", values = c("mean.a" = "red", "mean.b" = "darkblue"), labels = c("mean.a" = "Mean.A", "mean.b" = "Mean.B"))
gp <- gp + theme(legend.position="top")
gp

Here are a couple ways to do it. I'm not sure, but I think some of the difficulty comes from having more than one geom_vline and trying to hard-code values in the aes. You're building three scales here: fill for the density curves, and color and linetype for the vertical lines. But you're aiming (correct me if I'm misreading) for two legends.
The easiest way to deal with getting the proper legends is to make a small data frame for the means, rather than individual values for each mean. You can do this easily with dplyr to calculate means for each type.
library(tidyverse)
set.seed(1234)
data <- data.frame(value = rnorm(n = 10000, mean = 50, sd = 20),
type = sample(letters[1:2], size = 10000, replace = TRUE))
data$value[data$type == "b"] <- data$value[data$type == "b"] + 50
means <- group_by(data, type) %>%
summarise(mean = mean(value))
means
#> # A tibble: 2 x 2
#> type mean
#> <fct> <dbl>
#> 1 a 50.3
#> 2 b 99.9
Then when you plot, you can make a single geom_vline call, assigning the means data frame and allowing the aesthetics you want—color and linetype—to be scaled based on this data. The trick then is reconciling the names and labels: if you don't set the same legend name and labels for both the color and linetype scales, you'll have two legends for the lines. Set them the same, and you get a single legend for the mean lines.
ggplot(data, aes(x = value)) +
geom_density(aes(fill = type), alpha = 0.3) +
geom_vline(aes(xintercept = mean, color = type, linetype = type), data = means) +
scale_color_manual(values = c("red", "darkblue"), labels = c("Mean.A", "Mean.B"), name = NULL) +
scale_linetype_discrete(labels = c("Mean.A", "Mean.B"), name = NULL) +
scale_fill_manual(values = c(a = "green", b = "blue"), name = "Density")
The second way is to just add a step to creating the means data frame where you label the types the way you want later, i.e. "Mean.A" instead of just "a". Then you don't need to adjust labels, and you can skip the linetype scale—unless you want to change linetypes manually—and then just remove the name for that legend for both color and linetype in your labs.
means2 <- group_by(data, type) %>%
summarise(mean = mean(value)) %>%
mutate(type = paste("Mean", str_to_upper(type), sep = "."))
means2
#> # A tibble: 2 x 2
#> type mean
#> <chr> <dbl>
#> 1 Mean.A 50.3
#> 2 Mean.B 99.9
ggplot(data, aes(x = value)) +
geom_density(aes(fill = type), alpha = 0.3) +
geom_vline(aes(xintercept = mean, color = type, linetype = type), data = means2) +
scale_color_manual(values = c(Mean.A = "red", Mean.B = "darkblue")) +
scale_fill_manual(values = c(a = "green", b = "blue"), name = "Density") +
labs(color = NULL, linetype = NULL)
Created on 2018-06-05 by the reprex package (v0.2.0).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot2 - Assign symbol fill based on factors - r

Related

how to merge two legends with shapes in ggplot2?

How to represent different borders in tmap legend

Center stat_summary mean for different number of factor levels with position_dodge

Why does my ggplot have the wrong y-axis scale?

Plotting two densities with vertical lines and correct legend

Categories

Resources