I'm learning to use ggplot to plot my data. I found many examples such as ggplot multiple grouping bar and Grouped bar plot in ggplot. However, I cannot adapt their case with my data at this moment.
This is what the sample looks like:
# A tibble: 10 x 3
clusterNum Road period
<dbl> <chr> <chr>
1 2 Hualampong 06.00-06.15
2 2 Hualampong 06.00-06.15
3 2 Hualampong 06.16-06.30
4 2 Hualampong 06.16-06.30
5 2 Hualampong 06.16-06.30
6 3 Hualampong 06.16-06.30
7 2 Hualampong 06.16-06.30
8 3 Tonglor 17.46-18.00
9 3 Tonglor 17.46-18.00
10 3 Tonglor 17.46-18.00
data <- structure(list(clusterNum = c(2, 2, 2, 2, 2, 3, 2, 3, 3, 3),Road = c("Hualampong", "Hualampong", "Hualampong", "Hualampong","Hualampong", "Hualampong", "Hualampong", "Tonglor", "Tonglor","Tonglor"), period = c("06.00-06.15", "06.00-06.15", "06.16-06.30","06.16-06.30", "06.16-06.30", "06.16-06.30", "06.16-06.30","17.46-18.00", "17.46-18.00", "17.46-18.00")), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))
As you can see from my data, I want to create bar charts. Showing the total number of clusterNum columns with each period separately with the Road column. So, I might have two graphs based on the Road column.
My expected graph may look like this
Thank you for any helps.
Or if you're looking for separate graphs, you can use facet_wrap:
library(tidyverse)
data2 <- data %>% group_by(period, Road) %>% summarise(clusterNum = sum(clusterNum))
ggplot(data2, aes(x = period, y = clusterNum, fill = period)) +
geom_bar(position = "dodge", stat = "identity") +
facet_wrap(~Road)
With an additional breakout by clusterNum:
library(tidyverse)
data3 <- data %>% group_by(period, Road, clusterNum) %>%
count() %>%
data.frame()
data3$n <- as.factor(data3$n)
data3$clusterNum <- as.factor(data3$clusterNum)
ggplot(data3, aes(x = period, y = n, fill = clusterNum)) +
geom_bar(position = "dodge", stat = "identity") +
facet_wrap(~Road) +
theme_minimal()
Maybe something like this:
library(tidyverse)
data1 <- data %>%
group_by(clusterNum, Road, period) %>%
count()
ggplot(data1, aes(x=period, y=n, group=clusterNum)) +
geom_bar(aes(fill = Road),
position = "dodge",
stat = "identity")
Related
I'm trying to represent the movements of patients between several treatment groups measured in 3 different years. However, there're dropouts where some patients from 1st year are missing in the 2nd year or there are patients in the 2nd year who weren't in the 1st. Same for 3rd year. I have a label called "none" for these combinations, but I don't want it to be in the plot.
An example plot with only 2 years:
EDIT
I have tried with geom_sankey as well (https://rdrr.io/github/davidsjoberg/ggsankey/man/geom_sankey.html).
Although it is more accurate to what I'm looking for. I don't know how to omit the stratum groups without labels (NA). In this case, I'm using my full data, not a dummy example. I can't share it but I can try to create an example if needed. This is the code I've tried:
data = bind_rows(data_2015,data_2017,data_2019) %>%
select(sip, Year, Grp) %>%
mutate(Grp = factor(Grp), Year = factor(Year)) %>%
arrange(sip) %>%
pivot_wider(names_from = Year, values_from = Grp)
df_sankey = data %>% make_long(`2015`,`2017`,`2019`)
ggplot(df_sankey, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color=factor(node) )) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
scale_colour_viridis_d() +
theme_sankey(base_size = 16) +
theme(legend.position = "none") + xlab('')
Figure:
Any idea how to omit the missing groups every year as stratum (without omitting them in the alluvium) will be super helpful. Thanks!
Solved! The solution was much easier I though. I'll leave here the solution in case someone else struggles with a similar problem.
Create a wide table of counts per every group / cohort.
# Data with 3 cohorts for years 2015, 2017 and 2019
# Grp is a factor with 3 levels: 1 to 6
# sip is a unique ID
library(tidyverse)
data_wide = data %>%
select(sip, Year, Grp) %>%
mutate(Grp = factor(Grp, levels=c(1:6)), Year = factor(Year)) %>%
arrange(sip) %>%
pivot_wider(names_from = Year, values_from = Grp)
Using ggsankey package we can transform it as the specific type the package expects. There's already an useful function for this.
df_sankey = data %>% make_long(`2015`,`2017`,`2019`)
# The tibble accounts for every change in X axis and Y categorical value (node):
> head(df_sankey)
# A tibble: 6 × 4
x node next_x next_node
<fct> <chr> <fct> <chr>
1 2015 3 2017 2
2 2017 2 2019 2
3 2019 2 NA NA
4 2015 NA 2017 1
5 2017 1 2019 1
6 2019 1 NA NA
Looks like using the pivot_wider() to pass it to make_long() created a situation where each combination for every value was completed, including missings as NA. Drop NA values in 'node' and create the plot.
df_sankey %>% drop_na(node) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color=factor(node) )) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
scale_colour_viridis_d() +
theme_sankey(base_size = 16) +
theme(legend.position = "none") + xlab('')
Solved!
With my dataframe that looks like this (I have in total 1322 rows) :
I'd like to make a bar plot with the percentage of rating of the CFS score. It should look similar to this :
With this code, I can make a single bar plot for the column cfs_triage :
ggplot(data = df) +
geom_bar(mapping = aes(x = cfs_triage, y = (..count..)/sum(..count..)))
But I can't find out to make one with the three varaibles next to another.
Thank you in advance to all of you that will help me with making this barplot with the percentage of rating for this three variable !(I'm not sure that my explanations are very clear, but I hope that it's the case :))
Your best bet here is to pivot your data into long format. We don't have your data, but we can reproduce a similar data set like this:
set.seed(1)
df <- data.frame(cfs_triage = sample(10, 1322, TRUE, prob = 1:10),
cfs_silver = sample(10, 1322, TRUE),
cfs_student = sample(10, 1322, TRUE, prob = 10:1))
df[] <- lapply(df, function(x) { x[sample(1322, 300)] <- NA; x})
Now the dummy data set looks a lot like yours:
head(df)
#> cfs_triage cfs_silver cfs_student
#> 1 9 NA 1
#> 2 8 4 2
#> 3 NA 8 NA
#> 4 NA 10 9
#> 5 9 5 NA
#> 6 3 1 NA
If we pivot into long format, then we will end up with two columns: one containing the values, and one containing the column name that the value belonged to in the original data frame:
library(tidyverse)
df_long <- df %>%
pivot_longer(everything())
head(df_long)
#> # A tibble: 6 x 2
#> name value
#> <chr> <int>
#> 1 cfs_triage 9
#> 2 cfs_silver NA
#> 3 cfs_student 1
#> 4 cfs_triage 8
#> 5 cfs_silver 4
#> 6 cfs_student 2
This then allows us to plot with value on the x axis, and we can use name as a grouping / fill variable:
ggplot(df_long, aes(value, fill = name)) +
geom_bar(position = 'dodge') +
scale_fill_grey(name = NULL) +
theme_bw(base_size = 16) +
scale_x_continuous(breaks = 1:10)
#> Warning: Removed 900 rows containing non-finite values (`stat_count()`).
Created on 2022-11-25 with reprex v2.0.2
Maybe you need something like this: The formatting was taken from #Allan Cameron (many Thanks!):
library(tidyverse)
library(scales)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
group_by(id) %>%
mutate(percent = value/sum(value, na.rm = TRUE)) %>%
mutate(percent = ifelse(is.na(percent), 0, percent)) %>%
mutate(my_label = str_trim(paste0(format(100 * percent, digits = 1), "%"))) %>%
ggplot(aes(x = factor(name), y = percent, fill = factor(name), label = my_label))+
geom_col(position = position_dodge())+
geom_text(aes(label = my_label), vjust=-1) +
facet_wrap(. ~ id, nrow=1, strip.position = "bottom")+
scale_fill_grey(name = NULL) +
scale_y_continuous(labels = scales::percent)+
theme_bw(base_size = 16)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
I wan to plot the distribution of the overall number of wins of a player. I would like to have the last section of the x-axis as a "more than the values before" category.
Example data:
game_data <- data.frame(player = c(1,2,3,4,5, 6), n_wins = c(1,8,2,3,6,4))
game_data
player n_wins
1 1 1
2 2 8
3 3 2
4 4 3
5 5 6
6 6 4
6 6 4
The following code creates a category "NA", but I want it to be 5+ (= more than 5 wins).
game_data %>% group_by(player) %>% summarise(allwins = sum(n_wins)) %>%
ggplot(aes(x = cut(allwins, breaks = seq(1,6, by = 1)), include.lowest=TRUE)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels=scales::percent) +
labs(title="Distribution of Wins", subtitle="", y="Fraction of Players", x="Number of Wins")
I do not only want to change the label, I want it to automatically create the last category.
You can do the following by including +Inf as a break, note that you have no values that are 5, so you need to add a drop=FALSE with scale_x_discrete:
set.seed(100)
game_data <- data.frame(player = c(1,2,3,4,5, 6), n_wins = c(1,8,2,3,6,4))
BR = c(0:5,+Inf)
game_data %>%
group_by(player) %>% summarise(allwins = sum(n_wins)) %>%
ggplot(aes(x = cut(allwins, breaks = BR,labels=c(1:5,"5+")))) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels=scales::percent) +
labs(title="Distribution of Wins", subtitle="",
y="Fraction of Players", x="Number of Wins")+
scale_x_discrete(drop=FALSE)
Maybe a small comment, why do you need to summarize the data?
Given the following matrix
df <- matrix(c(10,8, 20, 6, 20, 25,"exp", "cnt", "exp","cnt","exp","cnt","gene1","gene1","gene2","gene2","gene3","gene3"),
nrow=6, dimnames=list(c("1", "2", "3","4","5","6"),c("Abundance", "Group","gene") ))
I would like to plot horizontally the barplot for two groups "exp" and "cnt" separated by a vertical line at zero, the y axis displaying positive values corresponding to each gene and the gene name in the x axis.
Here an example:
I tried the following code using ggplot but it didn't work.
ggplot(df, aes(x=gene))+
geom_bar(aes(y=Abundance, fill="exp"), stat="identity")+
geom_bar(aes(y=-Abundance, fill="cnt"), stat="identity")+
scale_fill_manual("Group",values=c(exp="red",cnt="green"))+
labs(y="Abundance")+coord_flip()
Any suggestions?
Your code is right but you're problem is that your making your data.frame a matrix. ggplot only takes data.frames as input. A second problem is that matrices can only hold one data type, so it's casting everything as a character (Thus it will give an error when you try to make Abundance negative)
Put your data in a data.frame and it will work:
library(tidyverse)
df <- tibble(Abundance = c(10, 8, 20, 6, 20, 25),
Group = c("exp", "cnt", "exp", "cnt", "exp", "cnt"),
gene = rep(paste0("gene", 1:3), each = 2))
df
#> # A tibble: 6 x 3
#> Abundance Group gene
#> <dbl> <chr> <chr>
#> 1 10 exp gene1
#> 2 8 cnt gene1
#> 3 20 exp gene2
#> 4 6 cnt gene2
#> 5 20 exp gene3
#> 6 25 cnt gene3
ggplot() +
geom_bar(data = filter(df, Group == "cnt"),
aes(x = gene, y = Abundance, fill = Group),
stat = 'identity', position = 'stack') +
geom_bar(data = filter(df, Group == "exp"),
aes(x = gene, y = -Abundance, fill = Group),
stat = 'identity', position = 'stack') +
coord_flip() +
geom_hline(yintercept = 0, linetype = "dashed")
Created on 2019-07-11 by the reprex package (v0.3.0)
I have a question concerning ordering of stacked bars in a swimmer plot using GGplot in R.
I have a sample dataset of (artificial) patients, who receive treatments.
library(tidyverse)
df <- read.table(text="patient start_t_1 t_1_duration start_t_2 t_2_duration start_t_3 t_3_duration start_t_4 t_4_duration end
1 0 1.5 1.5 3 NA NA 4.5 10 10
2 0 2 4.5 2 NA NA 2 2.5 10
3 0 5 5 2 7 0.5 7.5 2 9.5
4 0 8 NA NA NA NA 8 2 10", header=TRUE)
All patients start the first treatment at time = 0. Subsequently, patients get different treatments (numbered t_2 up to t_4).
I tried to plot the swimmer plot, using the following code:
df %>%
gather(variable, value, c(t_1_duration, t_2_duration, t_3_duration, t_4_duration)) %>%
ggplot(aes(x = patient, y = value, fill = variable)) +
geom_bar(stat = "identity") +
coord_flip()
However, the treatments are not displayed in the right order.
For example: patient 3 receives all treatments in consecutive orde, while patient 2 receives first treatment 1, then 4 and eventually 2.
So, simply reversing the order does not work.
How do I order the stacked bars in a chronological way?
What about this:
df %>%
gather(variable, value, c(t_1_duration, t_2_duration, t_3_duration,t_4_duration)) %>%
ggplot(aes(x = patient,
y = value,
# here you can specify the order of the variable
fill = factor(variable,
levels =c("t_4_duration", "t_3_duration", "t_2_duration","t_1_duration")))) +
geom_bar(stat = "identity") +
coord_flip()+ guides(fill=guide_legend("My title"))
EDIT:
that has been a long trip, because it involves a kind of hack. I think it's not not a dupe of that question, because it involves also some data reshaping:
library(reshape2)
# divide starts and duration
starts <- df %>% select(patient, start_t_1, start_t_2, start_t_3, start_t_4)
duration <- df %>% select(patient, t_1_duration,t_2_duration, t_3_duration, t_4_duration)
# here you melt them
starts <- melt(starts, id = 'patient') %>%
mutate(keytreat = substr(variable,nchar(as.vector(variable))-2, nchar(as.vector(variable)))) %>%
`colnames<-`(c("patient", "variable", "start","keytreat")) %>% select(-variable)
duration <- melt(duration, id = 'patient') %>% mutate(keytreat = substr(variable,1, 3)) %>%
`colnames<-`(c("patient", "variable", "duration","keytreat")) %>% select(-variable)
# join
dats <- starts %>% left_join(duration) %>% arrange(patient, start) %>% filter(!is.na(start))
# here the part for the plot
bars <- map(unique(dats$patient)
, ~geom_bar(stat = "identity", position = "stack"
, data = dats %>% filter(patient == .x)))
dats %>%
ggplot(aes(x = patient,
y = duration,
fill = reorder(keytreat,-start))) +
bars +
guides(fill=guide_legend("ordering")) + coord_flip()