ggplot sf_geom create custom border between multiple area - r

I want create border between multiple sub part of my map like this answer :
https://stackoverflow.com/a/49523256/9829458
However, It does not work in my R. Only external border of this map was drawn. And when I use this code on my dataset, I have the same problem...
My data :
city_code Name Long Lat Groups
<chr> <chr> <dbl> <dbl> <dbl>
1 34001 ABEI… 724751. 6262333. 9
2 34002 ADIS… 734961. 6270688. 10
3 34003 AGDE 739245. 6245728. 7
4 34004 AGEL 688135. 6249905. 4
5 34005 AGON… 758530. 6311345. 20
6 34006 AIGNE 683215. 6247000. 4
7 34007 AIGU… 685638. 6249976. 4
8 34008 LES … 705573. 6274482. 6
9 34009 ALIG… 727555. 6263258. 9
10 34010 ANIA… 747789. 6287511. 18
My map :
read_sf("Map.shp") %>%
mutate(Groups = as.factor(Groups)) %>%
mutate(Groups = factor(Groups, levels = c(paste0(1:23)))) %>%
ggplot() +
geom_sf(aes(fill = Groups)) +
theme_bw()
So, in my case, I want draw "Groups" border on my map while seeing cities border (and conserve fill = Groups colors).

Solution was edited in the original post :
read_sf("Map.shp") %>%
mutate(Groups = as.factor(Groups)) %>%
mutate(Groups = factor(Groups, levels = c(paste0(1:23)))) %>%
ggplot() +
geom_sf(aes(fill = Groups), size = 0.4) +
geom_sf(fill = "transparent", color = "Black", size = 1, data = . %>% group_by(Groups) %>% summarise()) +
theme_bw()

Related

Working with ggalluvial ggsankey library with missing combinations and dropouts

I'm trying to represent the movements of patients between several treatment groups measured in 3 different years. However, there're dropouts where some patients from 1st year are missing in the 2nd year or there are patients in the 2nd year who weren't in the 1st. Same for 3rd year. I have a label called "none" for these combinations, but I don't want it to be in the plot.
An example plot with only 2 years:
EDIT
I have tried with geom_sankey as well (https://rdrr.io/github/davidsjoberg/ggsankey/man/geom_sankey.html).
Although it is more accurate to what I'm looking for. I don't know how to omit the stratum groups without labels (NA). In this case, I'm using my full data, not a dummy example. I can't share it but I can try to create an example if needed. This is the code I've tried:
data = bind_rows(data_2015,data_2017,data_2019) %>%
select(sip, Year, Grp) %>%
mutate(Grp = factor(Grp), Year = factor(Year)) %>%
arrange(sip) %>%
pivot_wider(names_from = Year, values_from = Grp)
df_sankey = data %>% make_long(`2015`,`2017`,`2019`)
ggplot(df_sankey, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color=factor(node) )) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
scale_colour_viridis_d() +
theme_sankey(base_size = 16) +
theme(legend.position = "none") + xlab('')
Figure:
Any idea how to omit the missing groups every year as stratum (without omitting them in the alluvium) will be super helpful. Thanks!
Solved! The solution was much easier I though. I'll leave here the solution in case someone else struggles with a similar problem.
Create a wide table of counts per every group / cohort.
# Data with 3 cohorts for years 2015, 2017 and 2019
# Grp is a factor with 3 levels: 1 to 6
# sip is a unique ID
library(tidyverse)
data_wide = data %>%
select(sip, Year, Grp) %>%
mutate(Grp = factor(Grp, levels=c(1:6)), Year = factor(Year)) %>%
arrange(sip) %>%
pivot_wider(names_from = Year, values_from = Grp)
Using ggsankey package we can transform it as the specific type the package expects. There's already an useful function for this.
df_sankey = data %>% make_long(`2015`,`2017`,`2019`)
# The tibble accounts for every change in X axis and Y categorical value (node):
> head(df_sankey)
# A tibble: 6 × 4
x node next_x next_node
<fct> <chr> <fct> <chr>
1 2015 3 2017 2
2 2017 2 2019 2
3 2019 2 NA NA
4 2015 NA 2017 1
5 2017 1 2019 1
6 2019 1 NA NA
Looks like using the pivot_wider() to pass it to make_long() created a situation where each combination for every value was completed, including missings as NA. Drop NA values in 'node' and create the plot.
df_sankey %>% drop_na(node) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color=factor(node) )) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
scale_colour_viridis_d() +
theme_sankey(base_size = 16) +
theme(legend.position = "none") + xlab('')
Solved!

Plot multiple variable in the same bar plot

With my dataframe that looks like this (I have in total 1322 rows) :
I'd like to make a bar plot with the percentage of rating of the CFS score. It should look similar to this :
With this code, I can make a single bar plot for the column cfs_triage :
ggplot(data = df) +
geom_bar(mapping = aes(x = cfs_triage, y = (..count..)/sum(..count..)))
But I can't find out to make one with the three varaibles next to another.
Thank you in advance to all of you that will help me with making this barplot with the percentage of rating for this three variable !(I'm not sure that my explanations are very clear, but I hope that it's the case :))
Your best bet here is to pivot your data into long format. We don't have your data, but we can reproduce a similar data set like this:
set.seed(1)
df <- data.frame(cfs_triage = sample(10, 1322, TRUE, prob = 1:10),
cfs_silver = sample(10, 1322, TRUE),
cfs_student = sample(10, 1322, TRUE, prob = 10:1))
df[] <- lapply(df, function(x) { x[sample(1322, 300)] <- NA; x})
Now the dummy data set looks a lot like yours:
head(df)
#> cfs_triage cfs_silver cfs_student
#> 1 9 NA 1
#> 2 8 4 2
#> 3 NA 8 NA
#> 4 NA 10 9
#> 5 9 5 NA
#> 6 3 1 NA
If we pivot into long format, then we will end up with two columns: one containing the values, and one containing the column name that the value belonged to in the original data frame:
library(tidyverse)
df_long <- df %>%
pivot_longer(everything())
head(df_long)
#> # A tibble: 6 x 2
#> name value
#> <chr> <int>
#> 1 cfs_triage 9
#> 2 cfs_silver NA
#> 3 cfs_student 1
#> 4 cfs_triage 8
#> 5 cfs_silver 4
#> 6 cfs_student 2
This then allows us to plot with value on the x axis, and we can use name as a grouping / fill variable:
ggplot(df_long, aes(value, fill = name)) +
geom_bar(position = 'dodge') +
scale_fill_grey(name = NULL) +
theme_bw(base_size = 16) +
scale_x_continuous(breaks = 1:10)
#> Warning: Removed 900 rows containing non-finite values (`stat_count()`).
Created on 2022-11-25 with reprex v2.0.2
Maybe you need something like this: The formatting was taken from #Allan Cameron (many Thanks!):
library(tidyverse)
library(scales)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
group_by(id) %>%
mutate(percent = value/sum(value, na.rm = TRUE)) %>%
mutate(percent = ifelse(is.na(percent), 0, percent)) %>%
mutate(my_label = str_trim(paste0(format(100 * percent, digits = 1), "%"))) %>%
ggplot(aes(x = factor(name), y = percent, fill = factor(name), label = my_label))+
geom_col(position = position_dodge())+
geom_text(aes(label = my_label), vjust=-1) +
facet_wrap(. ~ id, nrow=1, strip.position = "bottom")+
scale_fill_grey(name = NULL) +
scale_y_continuous(labels = scales::percent)+
theme_bw(base_size = 16)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Plotting dodged periodic time series

I have some data about events happening at some hours of the day in certain conditions.
The data_frame looks somehow like this :
> tibble(event_id = 1:1000, hour = rep_len(0:23, 1000), conditions = rep_len(c("Non", "Oui"), 1000))
# A tibble: 1,000 × 3
event_id hour conditions
<int> <int> <chr>
1 1 0 Non
2 2 1 Oui
3 3 2 Non
4 4 3 Oui
5 5 4 Non
6 6 5 Oui
7 7 6 Non
8 8 7 Oui
9 9 8 Non
10 10 9 Oui
Somehow I have managed to represent it using geom_bar this way :
mydataframe %>%
group_by(hour, conditions) %>%
count() %>%
ggplot() +
geom_bar(aes(x = hour, y = n, fill = conditions), stat = "identity", position = "dodge")
With my actual data, I get a figure looking like this :
But I would like to get something like 2 dodged smoothlines or geom_density which I can't seem to get.
Do you have some ideas to help me ?
Thank you
library(tidyverse)
set.seed(42)
mydataframe <- tibble(event_id = 1:1000, hour = rep_len(0:23, 1000), conditions = sample(c("Non", "Oui"), 1000, replace = TRUE))
mydataframe %>%
count(hour, conditions) %>%
ggplot() +
geom_smooth(aes(hour, n, color = conditions), se = FALSE, span = 0.3)
Or if you want to dodge them, you could do this and tweak the amount of width between the series:
mydataframe %>%
count(hour, conditions) %>%
ggplot() +
geom_smooth(aes(hour, n, color = conditions), se = FALSE, span = 0.3,
position = position_dodge(width = 1))

specific fill order in geom_area

Happy new year! Consider this simple example:
> df <- tibble(type = c('0_10','0_9','0_8','0_10','0_9','0_8','1_10','1_9','1_8','1_10','1_9','1_8'),
+ time = c(1,1,1,2,2,2,1,1,1,2,2,2),
+ value = c(2,3,4,2,3,6,-2,-3,-4,-2,-3,-5))
> df
# A tibble: 12 x 3
type time value
<chr> <dbl> <dbl>
1 0_10 1 2
2 0_9 1 3
3 0_8 1 4
4 0_10 2 2
5 0_9 2 3
6 0_8 2 6
7 1_10 1 -2
8 1_9 1 -3
9 1_8 1 -4
10 1_10 2 -2
11 1_9 2 -3
12 1_8 2 -5
I am creating a plot that stacks the values of value over time, by type. I would like to obtain the same color for the type 0_10 and 1_10, another color for 0_9 and 1_9 and another color for 0_8 and 1_8.
Unfortunately, ggplot does not seem to use the factor ordering I am asking for. You can see below that 0_10 is purple while 1_10 is green... They should have the same color.
mylevels = c('0_10','0_9','0_8','1_10','1_9','1_8')
df %>%
ggplot(aes(x = time)) +
geom_area(inheris.aes = FALSE,
data = . %>% dplyr::filter(str_detect(type, '0_')),
aes(y = value,
fill = factor(type, levels = mylevels)),
position = 'stack', color = 'black')+
scale_fill_viridis_d() +
geom_area(inheris.aes = FALSE,
data = . %>% dplyr::filter(str_detect(type, '1_')),
aes(y = value, fill = factor(type, levels = mylevels)),
position = 'stack', color = 'black')
Any idea?
Thanks!
I'd create a new column from the last numbers from your type column for the fill. and create groups by type.
library(tidyverse)
df <- tibble(type = c('0_10','0_9','0_8','0_10','0_9','0_8','1_10','1_9','1_8','1_10','1_9','1_8'), time = c(1,1,1,2,2,2,1,1,1,2,2,2), value = c(2,3,4,2,3,6,-2,-3,-4,-2,-3,-5))
df %>%
mutate(colvar = gsub("^*._","", type)) %>%
ggplot(aes(x = time)) +
geom_area(aes(y = value,
fill = colvar,
group = type),
position = 'stack', color = 'black') +
scale_fill_viridis_d()
If you want to visualise your information of your first "type number", you need to make it another aesthetic, for example alpha:
update new colvar variable as factor with ordered levels
df %>%
mutate(colvar = factor(gsub("^*._","", type), levels = 8:10),
alphavar = as.integer(substr(type, 1, 1))) %>%
ggplot(aes(x = time)) +
geom_area(aes(y = value,
fill = colvar,
group = type,
alpha = alphavar),
position = 'stack', color = 'black') +
scale_fill_viridis_d() +
scale_alpha_continuous(breaks = 0:1, range = c(0.7,1) )

R - (ggplot2 library) - Legends not showing on graphs

What I'm doing
I'm using a library for R called ggplot2, which allows for a lot of different options for creating graphics and other things. I'm using that to display two different data sets on one graph with different colours for each set of data I want to display.
The Problem
I'm also trying to get a legend to to show up in my graph that will tell the user which set of data corresponds to which colour. So far, I've not been able to get it to show.
What I've tried
I've set it to have a position at the top/bottom/left/right to make sure nothing was making it's position to none by default, which would've hidden it.
The Code
# PDF/Plot generation
pdf("activity-plot.pdf")
ggplot(data.frame("Time"=times), aes(x=Time)) +
#Data Set 1
geom_density(fill = "#1A3552", colour = "#4271AE", alpha = 0.8) +
geom_text(x=mean(times)-1, y=max(density(times)$y/2), label="Mean {1} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(times)), color="cyan", linetype="dashed", size=1, alpha = 0.5) +
# Data Set 2
geom_density(data=data.frame("Time"=timesSec), fill = "gray", colour = "orange", alpha = 0.8) +
geom_text(x=mean(timesSec)-1, y=max(density(timesSec)$y/2), label="Mean {2} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(timesSec)), color="orange", linetype="dashed", size=1, alpha = 0.5) +
# Main Graph Info
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
dev.off()
Result
As pointed out by #Ben, you should pass the color into an aes in order to get the legend being displayed.
However, a better way to get a ggplot is to merge your two values "Time" and "Timesec" into a single dataframe and reshape your dataframe into a longer format. Here, to illustrate this, I created this dummy dataframe:
Time = sample(1:24, 200, replace = TRUE)
Timesec = sample(1:24, 200, replace = TRUE)
df <- data.frame(Time, Timesec)
Time Timesec
1 22 23
2 21 9
3 19 9
4 10 6
5 7 24
6 15 9
... ... ...
So, the first step is to reshape your dataframe into a longer format. Here, I'm using pivot_longer function from tidyr package:
library(tidyr)
library(dplyr)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val")
# A tibble: 400 x 2
var val
<chr> <int>
1 Time 22
2 Timesec 23
3 Time 21
4 Timesec 9
5 Time 19
6 Timesec 9
7 Time 10
8 Timesec 6
9 Time 7
10 Timesec 24
# … with 390 more rows
To add geom_vline and geom_text based on the mean of your values, a nice way of doing it easily is to create a second dataframe gathering the mean and the maximal density values needed to be plot:
library(tidyr)
library(dplyr)
df_lab <- df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
group_by(var) %>%
summarise(Mean = mean(val),
Density = max(density(val)$y))
# A tibble: 2 x 3
var Mean Density
<chr> <dbl> <dbl>
1 Time 11.6 0.0555
2 Timesec 12.1 0.0517
So, using df and df_lab, you can generate your entire plot. Here, we passed color and fill arguments into the aes and use scale_color_manual and scale_fill_manual to set appropriate colors:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
ggplot(aes(x = val, fill = var, colour = var))+
geom_density(alpha = 0.8)+
scale_color_manual(values = c("#4271AE", "orange"))+
scale_fill_manual(values = c("#1A3552", "gray"))+
geom_vline(inherit.aes = FALSE, data = df_lab,
aes(xintercept = Mean, color = var), linetype = "dashed", size = 1,
show.legend = FALSE)+
geom_text(inherit.aes = FALSE, data = df_lab,
aes(x = Mean-0.5, y = Density/2, label = var, color = var), angle = 90,
show.legend = FALSE)+
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
Does it answer your question ?

Resources