How to subset with ggplot2 without removing the shapes? - r

Hy guys, I'm working with ggplot2 and creating a geographic representation of my country. This is the dataset and the script I'm using ( prov2022 is the shapefile for the map)
#database
COD_REG COD_PROV Wage
1 91 530
1 92 520
1 93 410
2 97 300
2 98 205
2 99 501
13 102 700
13 103 800
13 159 900
18 162 740
18 123 590
18 119 420
19 162 340
19 123 290
19 119 120
#script
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `Wage`))+
geom_sf(data = ~ subset(., `Wage` > 300 & `Wage` <= 800)) +
theme_void() +
theme(legend.title=element_blank())+
scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black')) +
geom_blank()
It works fine, but I'm insterested in visualizing also the shapes of the areas that I've exclude with the command subset. My purpose was to fill with the color gradient only the regions with Wage > 300 & Wage <= 800, but setting geom_sf(data = ~ subset(., Wage > 300 & Wage <= 800)) I have removed completely the ones that do not satisfy this condition from my map. Actually, I need to have them in the output but whitout being filled (just their shapes).
How do I solve?
UPDATE ABOUT SCRIPT
This is what I'm using after #r2evans' suggestion
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `Importo medio mensile`))+
geom_sf(data = ~ transform(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 1500 & `Importo medio mensile` <= 1700, `Importo medio mensile`[NA], `Importo medio mensile`))) +
theme_void() +
theme(legend.title=element_blank())+
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), na.value = "#00000000") +
geom_blank()
but the answer is
Error in FUN(X[[i]], ...) : object 'Importo medio mensile' not found
UPDATE PART 2
If I want to fill using another variable Salario reale, but I want to maintaining the selection of the areas done with the values of the previous variable Importo medio mensile, what should I do?
Substituting only the fill variable doesn't work
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `Salario Reale`))+
geom_sf(data = ~ dplyr::mutate(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 1500 & `Importo medio mensile` <= 1700, `Importo medio mensile`, `Importo medio mensile`[NA]))) +
theme_void() +
theme(legend.title=element_blank())+
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), na.value = "#00000000") +
geom_blank()
it colors all the regions of my country as if the subset that I want to maintain (the one with the variable Importo medio mensile) weren't there. How can I solve?
UPDATE 3
The solution proposed by r2evans works!!

Instead of filtering out the data, just replace (optionally inline) the not-to-be-colored values with NA.
Continuing from my previous answer,
ggplot(usa, aes(fill = val)) +
geom_sf(data = ~ transform(., val = ifelse(val < 0.5, val[NA], val))) +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black')) +
geom_blank()
(The use of val[NA] is to make sure we have the one specific class of NA, as there are at least 6 different types of NA.)
Granted, gray may not be what you want, so you can fix that with na.value= (its default is na.value="grey50").
ggplot(usa, aes(fill = val)) +
geom_sf(data = ~ transform(., val = ifelse(val < 0.5, val[NA], val))) +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'),
na.value = "#00000000") +
geom_blank()
where #00000000 is a fully-transparent color. The first six 0s don't matter, the trailing two 00 indicates an alpha of 0 (transparent).
I think this means you want something like this:
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `Importo medio mensile`))+
geom_sf(data = ~ mutate(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 300 & `Importo medio mensile` <= 800, `Importo medio mensile`, `Importo medio mensile`[NA]))) +
theme_void() +
theme(legend.title=element_blank())+
scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black')) +
geom_blank()
Notes:
I updated from `Wage` in your example to `Importo medio mensile` as you mentioned in your comments;
My code above uses transform, which is base R and in general works fine, except when the names being used are "not normal R names", in which case it tends to add .s to the name. The use of dplyr::mutate fixes this problem. You're already using right_join, so I think I'm not adding any dependency.
Another way to look at this: the data = ~ mutate(...) is changing the data internally-only, so that the original data is untouched. One could easily do something like this for the same effect.
right_join(prov2022, database, by = "COD_PROV") %>%
mutate(SOMETHING = ifelse(`Importo medio mensile` > 300 & `Importo medio mensile` <= 800, `Importo medio mensile`, `Importo medio mensile`[NA])) %>%
ggplot(aes(fill = SOMETHING)) +
geom_sf() +
theme_void() +
theme(legend.title=element_blank())+
scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black')) +
geom_blank(aes(fill = `Importo medio mensile`))
noting that we needed to redefine fill= in the blank geom so that the correct range of values would be processed by ggplot.

Related

How to visualize only specific geographic area with ggplot2?

I'm working with ggplot2 and i'm creating a geographic representation of my country.
This is the dataset and the script I'm using ( prov2022 is the shapefile for the map)
#dataset
COD_REG COD_PROV Wage
1 91 530
1 92 520
1 93 510
2 97 500
2 98 505
2 99 501
13 102 700
13 103 800
13 159 900
18 162 740
18 123 590
18 119 420
19 162 340
19 123 290
19 119 120
#script
right_join(prov2022, dataset, by = "COD_PROV") %>%
ggplot(aes(fill = `Wage`)) +
geom_sf() +
theme_void() +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'))
It works fine, but now I'm insterested in visualizing only a specific area.
If I add a filter to select the regions that have the value of the variable COD_REG > 13, I get what I was looking for but the color gradient changes.
right_join(prov2022, dataset, by = "COD_PROV") %>%
filter(COD_REG >= 13 ) %>%
ggplot(aes(fill = `Wage`)) +
geom_sf() +
theme_void() +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'))
The color gradient of the output that I get is different if I use the filter because the colors are applied considering only the values of that specific areas and not anymore the ones of the whole country.
As consequence these areas do not have anymore the colors that they had at the beggining ( I mean in the entire map that i get with the first script).
I need to maintain the color gradient of the whole country, but get as output of ggplot2 only some specific areas without changing anything.
How do I solve?
Try this. I'll use fake data on the state map from package maps.
library(ggplot2)
library(maps)
usa <- sf::st_as_sf(map('state', plot = FALSE, fill = TRUE))
set.seed(42)
usa$val <- runif(length(usa$ID))
ggplot(usa, aes(fill = val)) +
geom_sf() +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'))
If we naively just filter the states we want to see, the colors change:
ggplot(usa, aes(fill = val)) +
geom_sf(data = ~ subset(., val > 0.5)) +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'))
If we add geom_blank, though, we can normalize the range of values from which the scale is determined. Since it still uses all of the original data, and it does nothing with it (except to update scales and limits), it "costs" nothing as far as drawing (e.g.) transparent or super-small things in order to get its way. From ?geom_blank:
The blank geom draws nothing, but can be a useful way of ensuring
common scales between different plots. See 'expand_limits()' for
more details.
Code:
ggplot(usa, aes(fill = val)) +
geom_sf(data = ~ subset(., val > 0.5)) +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black')) +
geom_blank()
Notice that I'm using inline ~ rlang-style functions for subsetting the data; this is my convention but is not required.

How to color/shade the area between two lines in ggplot2?

I would like to make a plot with two lines representing observational data and data from a special model. There are also two more lines which represent the maximum and minimum of the variability of other models. The goal is to shade the area between these two lines in grey, so that the first two lines are still visible.
Here is the code to reproduce the data:
# create the data set
Month.vec <- c(1:12)
Model.vec <- c(70.33056, 58.91058, 73.40891, 74.42824, 108.45975, 125.85887, 126.02867, 102.54128, 70.66263, 61.30316, 66.04057, 75.75262)
Obs.vec <- c(62.64178, 52.39356, 63.07376, 52.87248, 70.80587, 81.85081, 88.29134, 77.22920, 67.67458, 64.74425, 63.96322, 69.89868)
Up_lim.vec <- c(83.46967, 71.27700, 86.43001, 77.62739, 108.32674, 112.61118, 125.43512, 93.71193, 80.17298, 75.01851, 79.05700, 85.40042)
Low_lim.vec <- c(76.44381, 65.19571, 74.27778, 59.91012, 82.14684, 84.09151, 77.91529, 66.21702, 60.89712, 67.85613, 72.49409, 79.13741)
df <- as.data.frame(cbind(Month.vec, Obs.vec, Model.vec, Up_lim.vec, Low_lim.vec))
colnames(df) <- c("Month", "Observation", "Model", "Upper Limit", "Lower Limit")
A "normal" plot with four lines is done quite easily:
# plot
df %>%
as_tibble() %>%
pivot_longer(-1) %>%
ggplot(aes(Month, value, color = name)) +
scale_color_manual("",values= c("blue", "yellow", "red", "black")) +
scale_x_continuous(breaks = seq(1, 12, by = 1)) +
scale_y_continuous(breaks = seq(0, 140, by = 20)) +
ylab("Precipitation [mm]") +
geom_line() +
theme_bw()
This leads to this output:
So the idea is that the area between the black and blue line is shaded in grey color or something similar, so that the blue and yellow lines are still visible.
The result should look somewhat like this from this question:
I know that there are some similar questions here, most of them hinting towards using geom_ribbon.
I tried this but only received the following error message:
Error: Aesthetics must be either length 1 or the same as the data (48): ymax and ymin
Anybody with an idea how to do that?
I think it would be easier to keep the data into a wider format and then use geom_ribbon to create that shaded area:
df %>%
as_tibble() %>%
ggplot +
geom_line(aes(Month, Model, color = 'Model')) +
geom_line(aes(Month, Observation, color = 'Observation')) +
geom_ribbon(aes(Month, ymax=`Upper Limit`, ymin=`Lower Limit`), fill="grey", alpha=0.25) +
scale_x_continuous(breaks = seq(1, 12, by = 1)) +
scale_y_continuous(breaks = seq(0, 140, by = 20)) +
scale_color_manual(values = c('Model' = 'yellow','Observation' = 'red')) +
ylab("Precipitation [mm]") +
theme_bw() +
theme(legend.title = element_blank())
In case you want to keep the Upper Limit and Lower Limit lines you could create a dataframe for the ribbon (though admittedly it's a less elegant solution than above):
library(stringr)
df <- df %>%
as_tibble() %>%
pivot_longer(-1)
ribbon_df <- df %>% filter(str_detect(name, "Limit")) %>%
pivot_wider(names_from = name, values_from = value) %>%
mutate(value = `Upper Limit` ) %>%
mutate(name = "Upper Limit")
df %>% ggplot(aes(Month, value, color = name)) +
scale_color_manual("",values= c("blue", "yellow", "red", "black")) +
scale_x_continuous(breaks = seq(1, 12, by = 1)) +
scale_y_continuous(breaks = seq(0, 140, by = 20)) +
ylab("Precipitation [mm]") +
geom_ribbon(data = ribbon_df, aes(ymin = `Lower Limit`, ymax = `Upper Limit`), fill = "grey") +
geom_line() +
theme_bw()
If you're using the longer format, you need a wide table for the geom_ribbon data.
library(tidyverse)
# create the data set
Month.vec <- c(1:12)
Model.vec <- c(70.33056, 58.91058, 73.40891, 74.42824, 108.45975, 125.85887, 126.02867, 102.54128, 70.66263, 61.30316, 66.04057, 75.75262)
Obs.vec <- c(62.64178, 52.39356, 63.07376, 52.87248, 70.80587, 81.85081, 88.29134, 77.22920, 67.67458, 64.74425, 63.96322, 69.89868)
Up_lim.vec <- c(83.46967, 71.27700, 86.43001, 77.62739, 108.32674, 112.61118, 125.43512, 93.71193, 80.17298, 75.01851, 79.05700, 85.40042)
Low_lim.vec <- c(76.44381, 65.19571, 74.27778, 59.91012, 82.14684, 84.09151, 77.91529, 66.21702, 60.89712, 67.85613, 72.49409, 79.13741)
df <- as.data.frame(cbind(Month.vec, Obs.vec, Model.vec, Up_lim.vec, Low_lim.vec))
colnames(df) <- c("Month", "Observation", "Model", "UL", "LL")
df_new <- df %>%
as_tibble() %>%
pivot_longer(-1)
ggplot() +
# scale_color_manual("",values= c("blue", "yellow", "red", "black")) +
scale_x_continuous(breaks = seq(1, 12, by = 1)) +
scale_y_continuous(breaks = seq(0, 140, by = 20)) +
ylab("Precipitation [mm]") +
geom_line(data=df_new,aes(Month, value, color = name)) +
geom_ribbon(data=df, aes(x=Month,ymin = LL, ymax = UL),alpha=0.5)
theme_bw()
which will result in this

Plot with geom_smooth(,) multiple colours, double y-axis with four variables in ggplot2

I have an issue with ggplot2 plotting system with R.
I would like to print a graph, scatterplot + smoothing with two grades (ref) and two variable each (Vix, monomer), with vix referring to the left y-Axis and monomer referring to the right y-Axis. I would like to have red and blue dark colour for ref at 130°C and the same but pale colours for the 150°C one. Colours are the followings, but for understanding it is not really important:'#644196', '#bba6d9', '#f92410', '#fca49c'. In this way I would obtain 4 lines with 4 different colours.
I used to define the colours according the command:
scale_color_manual(values=c('#644196', '#bba6d9', '#f92410', '#fca49c')) +
The problem is that I obtain 4 lines but only two colours and also the legend has only two assignments (and not 4 as i expected). It looks like it changes the colours over the ref and it doesn't assign any colour change to the two variables Vix and monomer.
Below I report the whole code.
Dati <- data.frame("Vix" = c(62500, 87000, 122000, 140000, 82700, 73000, 110000, 110000, 140300, 81500), "monomer" = c(0.089,0.08,0.095,0.1,0.111, 0.09, 0.094, 0.099, 0.111, 0.197), "Time" = c(30, 60, 90, 120, 135, 30, 60, 90, 120, 135), "ref" = c('130°C', '130°C', '130°C', '130°C', '130°C', '150°C', '150°C', '150°C', '150°C', '150°C'))
attach(Dati)
library(ggplot2)
library(readxl)
####Graph processing
scaleFactor <- max(Vix) / max(monomer)
Graph <- ggplot(Dati, aes(x= Time, col=(ref))) +
geom_point(aes(y= Vix, col=(ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y= Vix), method="loess") +
geom_point(aes(y= monomer * scaleFactor, col=ref), shape = 1, size = 3.5) +
geom_smooth(aes(y=monomer * scaleFactor), method="loess") +
scale_color_manual(values=c('#644196', '#bba6d9', '#f92410', '#fca49c')) +
scale_y_continuous(name="Vix", sec.axis=sec_axis(~./scaleFactor, name="monomer")) +
theme(
axis.title.y.left=element_text(color='#f92410'),
axis.text.y.left=element_text(color='#f92410'),
axis.title.y.right=element_text(color='#644196'),
axis.text.y.right=element_text(color='#644196')
)
Graph
Obtained output graph
Is somebody able to understand wht could I do in order to fix this issue?
Thank you in advance for every your possible kind reply.
Probably the easiest way is to add information to the variable at the specification of aesthetics. In the example below, we paste0() the extra information whether the series is Vix or monomer to the colours.
Graph <- ggplot(Dati, aes(x= Time)) +
geom_point(aes(y= Vix, col=paste0("Vix ", ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y= Vix, col = paste0("Vix ", ref)), method="loess") +
geom_point(aes(y= monomer * scaleFactor, col=paste0("Monomer ", ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y=monomer * scaleFactor, col = paste0("Monomer ", ref)), method="loess") +
scale_color_manual(values=c('#644196', '#bba6d9', '#f92410', '#fca49c'),
name = "Series?") +
scale_y_continuous(name="Vix", sec.axis=sec_axis(~./scaleFactor, name="monomer")) +
theme(
axis.title.y.left=element_text(color='#f92410'),
axis.text.y.left=element_text(color='#f92410'),
axis.title.y.right=element_text(color='#644196'),
axis.text.y.right=element_text(color='#644196')
)
Graph
You have 2 colors because your variable mapped to color (ref) has 2 distinct values. I guess you would like to have Vix and monomer curves for each value of ref. You can get that by getting your data into long format and creating new variable that refers to temperature and to Vix or monomer:
scaleFactor <- max(Dati$Vix) / max(Dati$monomer)
STEP 1: calculate monomer, create column that tells you if value if Vix or monomer (long format for those two variables), and recreate ref
Dati <- Dati %>%
mutate(
monomer = monomer * scaleFactor
) %>%
pivot_longer(cols = c(Vix, monomer)) %>%
mutate(ref = str_c(ref, name, sep = "-"))
STEP 2 map ref to color aesthetic (long format is neat for ggplot2)
ggplot(Dati, aes(Time, value, color = ordered(ref, levels = unique(ref)))) +
geom_point(shape = 1, size = 3.5) +
geom_smooth(method = "loess") +
scale_color_manual("groups", values = c('#fca49c', '#bba6d9', '#f92410', '#644196')) +
scale_y_continuous(name = "Vix", sec.axis = sec_axis(~./scaleFactor, name = "monomer")) +
theme(
axis.title.y.left = element_text(color = '#f92410'),
axis.text.y.left = element_text(color = '#f92410'),
axis.title.y.right = element_text(color = '#644196'),
axis.text.y.right = element_text(color = '#644196')
)
RESULT:

A heat map animation of daily deaths

I have a data frame that has columns Region, Date, and Deaths, and I've imported the package "maps" and its 50 state map.
All of the examples I've seen ask me to merge() the data with the map. However, when I do this merging, I manage to end up with an object of over 4 million rows.
The daily data is in melted8 and then melted9.
Because of the huge size of the merge(), the animate() step takes a long time to run... in fact I cut it short after 10 minutes. I do not know if my ggplot() is correctly made, but it is also huge (240 mb).
Is there a more reasonably-sized object I could give to ggplot(), and am I giving ggplot() the right instructions?
# a sample
melted8[sample(nrow(melted8), 5), ]
region date deaths
<chr> <int> <dbl>
arizona 214 7.2815030
missouri 287 0.0000000
arkansas 160 0.3313668
mississippi 53 0.0000000
new jersey 300 0.7880939
library(ggplot2)
library(gganimate)
library(maps)
us.map <- map_data("state") #50 state map from library(maps)
melted9 <- merge(us.map, melted8, by="region", all.x=T)
d <- ggplot(melted9) +
geom_polygon(aes(long,lat, group = group), color='white', fill=NA, data=us.map) +
geom_polygon(aes(long,lat, group = group, fill = deaths), color = "white") +
scale_fill_gradient(low = "gray65", high = "red") +
labs(title = "Deaths per Day") +
ease_aes("linear")
a <- animate(d, duration = 30, nframes = nrow(melted9)/50, end_pause = 5)
a
You don't have to merge the dataset with the map file, if you use geom_map instead of geom_polygon.
See if this is faster for you:
layer_type.GeomMap <- function(x) 'point' # must run this line first
melted8 %>%
ggplot(aes(fill = deaths, map_id = region)) +
geom_map(map = us.map) +
expand_limits(x = us.map$long, y = us.map$lat) +
coord_fixed() +
scale_fill_gradient(low = "gray65", high = "red") +
theme(legend.position = "bottom") +
labs(title = "Deaths per Day: {closest_state}",
x = "lon", y = "lat") +
transition_states(date)
Dataset used (simulating 7 days of records for each state):
library(dplyr)
set.seed(123)
melted8 <- data.frame(region = unique(us.map$region)) %>%
mutate(date = list(seq(1, 7))) %>%
tidyr::unnest(cols = c(date)) %>%
group_by(region) %>%
mutate(deaths = abs(rnorm(n()))) %>%
ungroup()

How to Add Extra Labels on y-axis without Data in ggplot2

I am making a plot showing two sets of regression coefficients and standard errors and the graph is as follow:
What I want to do further is to add extra variables without any data on the y-axis. For instance, put a label FeatGender on top of the label FeatGenderMale, or for another example, put a label FeatEU in between the label of FeatPartyIDLiberal Democrats and the label of FeatEUIntegrationSupportEUIntegration. Below is the reduced version of data:
coef se low high sex
1 -0.038848364 0.02104994 -0.080106243 0.002409514 Female
2 0.095831201 0.02793333 0.041081877 0.150580526 Female
3 0.050972670 0.02828353 -0.004463052 0.106408391 Female
4 -0.183558492 0.02454943 -0.231675377 -0.135441606 Female
5 0.044879447 0.02712518 -0.008285914 0.098044808 Female
6 -0.003858672 0.03005477 -0.062766024 0.055048681 Male
7 0.003048763 0.04687573 -0.088827676 0.094925203 Male
8 0.015343897 0.03948959 -0.062055700 0.092743494 Male
9 -0.132600259 0.04146323 -0.213868197 -0.051332322 Male
10 -0.029764559 0.04600719 -0.119938650 0.060409533 Male
Here are my codes:
v_name <- c("FeatGenderMale", "FeatPartyIDLabourParty", "FeatPartyIDLiberalDemocrats",
"FeatEUIntegrationOpposeEUIntegration", "FeatEUIntegrationSupportEUIntegration")
t <- ggplot(temp, aes(x=c(v_name,v_name), y=coef, group=sex, colour=sex))
t +
geom_point(position = position_dodge(width = 0.3)) +
geom_errorbar(aes(ymin = low, ymax = high, width = 0), position = position_dodge(0.3)) +
coord_flip() +
scale_x_discrete(limits = rev(v_name)) +
geom_hline(yintercept = 0.0, linetype = "dotted") +
theme(legend.position = "bottom")
Thanks for the help!
Here's an approach that first applies the v_name into the source data frame, but then uses a longer appended version of the v_name vector for the axis.
library(ggplot2); library(dplyr)
# Add the v_name into the table
temp2 <- temp %>% group_by(sex) %>% mutate(v_name = v_name) %>% ungroup()
# Make the dummy label for axis with add'l entries
v_name2 <- append(v_name, "FeatGender", after = 0)
v_name2 <- append(v_name2, "FeatEU", after = 4)
# Plot using the new table
t <- ggplot(temp2, aes(x=v_name, y=coef, group=sex, colour=sex))
t +
geom_point(position = position_dodge(width = 0.3)) +
geom_errorbar(aes(ymin = low, ymax = high, width = 0), position = position_dodge(0.3)) +
coord_flip() +
# ... but use the larger list of axis names
scale_x_discrete(limits = rev(v_name2)) +
geom_hline(yintercept = 0.0, linetype = "dotted") +
theme(legend.position = "bottom")

Resources