Swimmer plot in R (ggplot): How to order stacked bars? - r

I have a question concerning ordering of stacked bars in a swimmer plot using GGplot in R.
I have a sample dataset of (artificial) patients, who receive treatments.
library(tidyverse)
df <- read.table(text="patient start_t_1 t_1_duration start_t_2 t_2_duration start_t_3 t_3_duration start_t_4 t_4_duration end
1 0 1.5 1.5 3 NA NA 4.5 10 10
2 0 2 4.5 2 NA NA 2 2.5 10
3 0 5 5 2 7 0.5 7.5 2 9.5
4 0 8 NA NA NA NA 8 2 10", header=TRUE)
All patients start the first treatment at time = 0. Subsequently, patients get different treatments (numbered t_2 up to t_4).
I tried to plot the swimmer plot, using the following code:
df %>%
gather(variable, value, c(t_1_duration, t_2_duration, t_3_duration, t_4_duration)) %>%
ggplot(aes(x = patient, y = value, fill = variable)) +
geom_bar(stat = "identity") +
coord_flip()
However, the treatments are not displayed in the right order.
For example: patient 3 receives all treatments in consecutive orde, while patient 2 receives first treatment 1, then 4 and eventually 2.
So, simply reversing the order does not work.
How do I order the stacked bars in a chronological way?

What about this:
df %>%
gather(variable, value, c(t_1_duration, t_2_duration, t_3_duration,t_4_duration)) %>%
ggplot(aes(x = patient,
y = value,
# here you can specify the order of the variable
fill = factor(variable,
levels =c("t_4_duration", "t_3_duration", "t_2_duration","t_1_duration")))) +
geom_bar(stat = "identity") +
coord_flip()+ guides(fill=guide_legend("My title"))
EDIT:
that has been a long trip, because it involves a kind of hack. I think it's not not a dupe of that question, because it involves also some data reshaping:
library(reshape2)
# divide starts and duration
starts <- df %>% select(patient, start_t_1, start_t_2, start_t_3, start_t_4)
duration <- df %>% select(patient, t_1_duration,t_2_duration, t_3_duration, t_4_duration)
# here you melt them
starts <- melt(starts, id = 'patient') %>%
mutate(keytreat = substr(variable,nchar(as.vector(variable))-2, nchar(as.vector(variable)))) %>%
`colnames<-`(c("patient", "variable", "start","keytreat")) %>% select(-variable)
duration <- melt(duration, id = 'patient') %>% mutate(keytreat = substr(variable,1, 3)) %>%
`colnames<-`(c("patient", "variable", "duration","keytreat")) %>% select(-variable)
# join
dats <- starts %>% left_join(duration) %>% arrange(patient, start) %>% filter(!is.na(start))
# here the part for the plot
bars <- map(unique(dats$patient)
, ~geom_bar(stat = "identity", position = "stack"
, data = dats %>% filter(patient == .x)))
dats %>%
ggplot(aes(x = patient,
y = duration,
fill = reorder(keytreat,-start))) +
bars +
guides(fill=guide_legend("ordering")) + coord_flip()

Related

Plotting group proportions with continuous variable

I would like to plot the proportion of levels of a group alongside a continuous variable. Since the x-axis is continuous, it is not really possible to compute proportions at each point (since there is an infinite number of them). So, usually, one cuts the continuous variable into bins, and plot them. Another solution is to use the density, but I want the proportions (so, the percentage) in the y-axis and I'm pretty sure density is not about proportions.
As an example, let's use iris and try to plot the share of each species among Sepal.Length. One can create bins using Hmisc::cut2 and then count the proportions for each group:
library(tidyverse)
library(Hmisc)
dat <- iris %>%
mutate(Sepal.Length = Sepal.Length + rnorm(n()),
cut = cut2(Sepal.Length, g = 30, levels.mean = T)) %>%
group_by(cut) %>%
summarise(set = sum(Species == "setosa") / n(),
vir = sum(Species == "virginica") / n(),
ver = sum(Species == "versicolor") / n()) %>%
pivot_longer(-cut)
# A tibble: 90 x 3
cut name value
<fct> <chr> <dbl>
1 3.0126 set 0.6
2 3.0126 vir 0
3 3.0126 ver 0.4
4 3.7616 set 0.8
5 3.7616 vir 0
6 3.7616 ver 0.2
7 3.9898 set 0.8
8 3.9898 vir 0
9 3.9898 ver 0.2
10 4.1577 set 0.2
# ... with 80 more rows
And the plot looks like this, e.g. for name == "ver"
dat %>%
filter(name == "ver") %>%
ggplot(aes(x = cut, y = value)) +
geom_col()
Now, is there any way to make this easier, and more esthetic?
Especially, making the x-axis a continuous back again so that one could e.g. create a geom_line between every columns of the plot (maybe making rolling means?). Or is it a bad practice and that's why I can't see any documentation about this?
Setting the variable cut to numeric did the job, but there may still be better options.
dat %>%
filter(name == "ver") %>%
ggplot(aes(x = as.numeric(as.character(cut)), y = value)) +
geom_col()
Or with a line:
dat %>%
filter(name == "ver") %>%
ggplot(aes(x = as.numeric(as.character(cut)), y = value)) +
geom_line()

How do I create a stacked bar chart in R, where the y axis should denote the percentages for the bars?

I would like to create a stacked bar chart in R. My X axis just contains data on sex i.e male or female. I just need the y axis to show percentages of the stacked bars. The "Survived" column is just a mixture of 0s and 1s. I.e 1 denoting that an indiividual survived an experience and 0 showing that the individual did not survive the experience. I am not sure what to put in for the y label. Can anyone help please?
ggplot(data = df, mapping = aes(x = Sex, y = ? , fill = Survived)) + geom_bar(stat = "identity")
One possible solution is to use dplyr package to calculate percentage of each categories outside of ggplot2 and then use those values to get your bargraph using geom_col:
library(dplyr)
df %>% count(Sex, Survive) %>%
group_by(Sex) %>%
mutate(Percent = n/sum(n)*100)
# A tibble: 4 x 4
# Groups: Sex [2]
Sex Survive n Percent
<fct> <dbl> <int> <dbl>
1 F 0 26 55.3
2 F 1 21 44.7
3 M 0 34 64.2
4 M 1 19 35.8
And now with the plotting part:
library(dplyr)
library(ggplot2)
df %>% count(Sex, Survive) %>%
group_by(Sex) %>%
mutate(Percent = n/sum(n)*100) %>%
ggplot(aes(x = Sex, y = Percent, fill = as.factor(Survive)))+
geom_col()
Reproducible example
df <- data.frame(Sex = sample(c("M","F"),100, replace = TRUE),
Survive = sample(c(0,1), 100, replace = TRUE))

100% percentage barplot in R

I have a dataset like so:
I want to create a 100% bar plot from this... such that there is a 100% bar for status, and a 100% bar for Type.... like so:
The picture only has 1 bar for status, but I wish for two bars side by side, 1 for status, 1 for type..
Any help would be appreciated, I want to do this in R
Simple base R solution:
p1 <- as.matrix(prop.table(table(data$status))) * 100
p2 <- as.matrix(prop.table(table(data$Type))) * 100
op <- par(mfrow=c(1,2), las=1, mar=c(3,4,1,0))
barplot(p1, legend=TRUE, names="status", ylab="Percent")
barplot(p2, legend=TRUE, names="Type")
par(op)
data <- data.frame(id=1:10,
status=c("P","F","F","P","F","P","P","F","P","P"),
Type=c("full","full","full","part","part","full","full","part","part","full"))
data
id status Type
1 1 P full
2 2 F full
3 3 F full
4 4 P part
5 5 F part
6 6 P full
7 7 P full
8 8 F part
9 9 P part
10 10 P full
Maybe with ggplot2
data %>%
pivot_longer(-id) %>%
group_by(name, value) %>%
summarise(n=n()) %>%
ggplot(aes(fill=value, y=n, x=name)) +
geom_bar(position="fill", stat="identity") # Needs polishing
Despite the information provided being a little bit scarce indeed, You seem to look for a stacked barplot with percentages. You might try something like:
# Some sample data:
dta <- tibble(id = 1:10,
status = rbernoulli(n = 10, p = 0.3),
type = rbernoulli(n = 10, p = 0.6))
# Transformation and plotting:
dta %>%
pivot_longer(c(status, type), names_to = "variable") %>%
group_by(variable, value) %>%
summarize(percent = n()) %>%
mutate(percent = percent / sum(percent)) %>%
ungroup() %>%
ggplot() +
aes(x = variable, y = percent, fill = value) +
geom_bar(stat = "identity") +
theme_bw()
Resulting in:
Does this help?

How to draw stacked barplot on the summed data

For data called df that reads:
car suv pickup
1 2 1
2 3 4
4 1 2
5 4 2
3 1 1
total = apply(df,1,sum)
barplot(total,col= rainbow(5))
So what I did right now is plotting a barplot on total number of cars, which are in fact, the sum of each row. What I want to do now is to present it as a stack barplot on the sum.
For now, it would just show "total" without any lines indicating whether 1 car, 2 suv, 1 pickup addes to 4 "total".
Note. It is different from barplot(matrix(df)), because that's just dividing it my car,suv,pickup, that disregards total number.
You can achieve this easily using ggplot2 and reshape2.
You will need an ID column to track the rows, so I have added that in. I melt the data to long type so that the different groups can be managed and plotted accordingly.
Then plot using geom_bar, specifying the row ids as the x axis and the groupings (fill and colour) for the stack plot and legend.
library(reshape2)
library(ggplot2)
df <- data.frame("ID" = c(1,2,3,4,5), "car" = c(1,2,4,5,3), "suv" = c(2,3,1,4,1), "pickup" = c(1, 4, 2, 2, 1))
long_df <- df %>% melt(id.vars = c("ID") ,value.name = "Number", variable.name = "Type")
ggplot(data = long_df, aes(x = ID, y = Number)) +
geom_bar(aes(fill = Type, colour = Type),
stat = "identity",
position = "stack")
With base R
df %>% melt(id.vars = c("ID") ,value.name = "Number", variable.name = "Type") %>%
dcast(Type ~ ID, value.var = "Number") %>%
as.matrix() %>%
barplot()
Are you after something like this?
library(tidyverse)
df %>%
rowid_to_column("row") %>%
gather(k, v, -row) %>%
ggplot(aes(row, v, fill = k)) +
geom_col()
We use a stacked barplot here, so there is no need to manually calculate the sum. The key here is to transform data from wide to long and keep track of the row.
Sample data
df <- read.table(text =
"car suv pickup
1 2 1
2 3 4
4 1 2
5 4 2
3 1 1", header = T)

split data into groups in R

My data frame looks like this:
plant distance
one 0
one 1
one 2
one 3
one 4
one 5
one 6
one 7
one 8
one 9
one 9.9
two 0
two 1
two 2
two 3
two 4
two 5
two 6
two 7
two 8
two 9
two 9.5
I want to split distance of each level into groups by interval(for instance,interval=3), and compute percentage of each group. Finally, plot the percentages of each level of each group similar like this:
my code:
library(ggplot2)
library(dplyr)
dat <- data %>%
mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>%
group_by(plant, group) %>%
summarise(percentage = n()) %>%
mutate(percentage = percentage / sum(percentage))
p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) +
geom_bar(stat = "identity", position = "stack")+
scale_y_continuous(labels=percent)
p
But my plot is shown below: the group 4 was missing.
And I found that the dat was wrong, the group 4 was NA.
The likely reason is that the length of group 4 was less than the interval=3, so my question is how to fix it? Thank you in advance!
I have solved the problem.The reason is that the cut(distance, seq(0, max(distance), 3), F) did not include the maximum and minimum values.
Here is my solution:
dat <- my_data %>%
mutate(group = factor(cut(distance, seq(from = min(distance), by = 3, length.out = n()/ 3 + 1), include.lowest = TRUE))) %>%
count(plant, group) %>%
group_by(plant) %>%
mutate(percentage = n / sum(n))

Resources