Problem Description
I am trying to make a swimmerplot in R using ggplot. However, I encounter a problem when I would like to have 'empty' space between two stacked bars of the plot: the bars are arranged next to one another.
Code & Sample data
I have the following sample data:
# Sample data
df <- read.table(text="patient start keytreat duration
sub-1 0 treat1 3
sub-1 8 treat2 2
sub-1 13 treat3 1.5
sub-2 0 treat1 4.5
sub-3 0 treat1 4
sub-3 4 treat2 8
sub-3 13.5 treat3 2", header=TRUE)
When I use the following code to generate a swimmerplot, I end up with a swimmerplot of 3 subjects. Subject 2 received only 1 treatment (treatment 1), so this displays correctly.
However, subject 1 received 3 treatments: treatment 1 from time point 0 up to time point 3, then nothing from 3 to 8, then treatment 2 from 8 until 10 etc...
The data is plotted in a way, that in patient 1 and 3 all treatments are consecutive instead of with 'empty' intervals in-between.
# Plot: bars
bars <- map(unique(df$patient)
, ~geom_bar(stat = "identity", position = "stack", width = 0.6,
, data = df %>% filter(patient == .x)))
# Create plot
ggplot(data = df, aes(x = patient,
y = duration,
fill = reorder(keytreat,-start))) +
bars +
guides(fill=guide_legend("ordering")) +
coord_flip()
Question
How do I include empty spaces between two non-consecutive treatments in this swimmerplot?
I don't think geom_bar is the right geom in this case. It's really meant for showing frequencies or counts and you can't explicitly control their start or end coordinates.
geom_segment is probably what you want:
library(tidyverse)
# Sample data
df <- read.table(text="patient start keytreat duration
sub-1 0 treat1 3
sub-1 8 treat2 2
sub-1 13 treat3 1.5
sub-2 0 treat1 4.5
sub-3 0 treat1 4
sub-3 4 treat2 8
sub-3 13.5 treat3 2", header=TRUE)
# Add end of treatment
df_wrangled <- df %>%
mutate(end = start + duration)
ggplot(df_wrangled) +
geom_segment(
aes(x = patient, xend = patient, y = start, yend = end, color = keytreat),
size = 8
) +
coord_flip()
Created on 2019-03-29 by the reprex package (v0.2.1)
I have a table of data which already contain several values to be plotted on a barplot with ggplot2 package (already cumulative data).
The data in the data frame "reserves" has the form (simplified):
period,amount,a1,a2,b1,b2,h1,h2,h3,h4
J,18.1,30,60,40,60,15,50,30,5
K,29,65,35,75,25,5,50,40,5
P,13.3,94,6,85,15,10,55,20,15
N,21.6,95,5,80,20,10,55,20,15
The first column (period) is the geological epoch. It will be on x axis, and I needed to have no extra ordering on it, so I prepared appropriate factor labelling with the command
reserves$period <- factor(reserves$period, levels = reserves$period)
The column "amount" is the main column to be plotted as y axis (it is percentage of hydrocarbons in each epoch, but it could be in absolute values as well, say, millions of tons or whatever). So basic plot is invoked by the command:
ggplot(reserves,aes(x=period,y=amount)) + geom_bar(stat="identity")
But here is the question. I need to plot other values, that is a1-a2, b1-b2, and h1-h4 on the same bar graph. These values are percentage values for each letter (for example, a1=60, then a2=40; the same for b1-b2; and for h1-h4 as well they sum up to 100. So: I need to have values a1-a2 as some color, proportionally dividing the "amount" bar for each value of x (stacked barplot), then I need the same for values b1-b2; so we have for each period two adjacent columns (grouped barplots), each of them is stacked. And next, I need the third column, for values h1-h4, perhaps, also as a stacked barplot, but either as a third column, or as a staggered barplot above the first one.
So the layout looks like this:
I learned that I need first to reshape data with package reshape2, and then use the option position="dodge" or position="fill" in geom_bar(), but here is the combination thereof. And the third barplot (for values h1-h4) seems to need "stacked percent" representation with fixed height.
Are there packages which handle the data for plotting in a more intuitive way? Lets say, we just declare, that we want variables ai,bi, hi to be plotted.
First you should reshape your data from wide to long, then scale your proportions to their raw values. Then split your old column names (now levels of "lett") into their letters and numbers for labeling. If your real data aren't formatted like this (a1...h4) there's ways to handle that as well.
library(dplyr)
library(tidyr)
library(ggplot2)
reserves <- read.csv(text = "period,amount,a1,a2,b1,b2,h1,h2,h3,h4
J,18.1,30,60,40,60,15,50,30,5
K,29,65,35,75,25,5,50,40,5
P,13.3,94,6,85,15,10,55,20,15
N,21.6,95,5,80,20,10,55,20,15")
reserves.tidied <- reserves %>%
gather(key = lett, value = prop, -period, -amount) %>%
mutate(rawvalue = prop * amount/100,
lett1 = substr(lett, 1, 1),
num = substr(lett, 2, 2))
reserves.tidied
period amount lett prop rawvalue lett1 num
1 J 18.1 a1 30 5.430 a 1
2 K 29.0 a1 65 18.850 a 1
3 P 13.3 a1 94 12.502 a 1
4 N 21.6 a1 95 20.520 a 1
5 J 18.1 a2 60 10.860 a 2
6 K 29.0 a2 35 10.150 a 2
7 P 13.3 a2 6 0.798 a 2
8 N 21.6 a2 5 1.080 a 2
9 J 18.1 b1 40 7.240 b 1
10 K 29.0 b1 75 21.750 b 1
11 P 13.3 b1 85 11.305 b 1
12 N 21.6 b1 80 17.280 b 1
13 J 18.1 b2 60 10.860 b 2
14 K 29.0 b2 25 7.250 b 2
15 P 13.3 b2 15 1.995 b 2
16 N 21.6 b2 20 4.320 b 2
17 J 18.1 h1 15 2.715 h 1
18 K 29.0 h1 5 1.450 h 1
19 P 13.3 h1 10 1.330 h 1
20 N 21.6 h1 10 2.160 h 1
21 J 18.1 h2 50 9.050 h 2
22 K 29.0 h2 50 14.500 h 2
23 P 13.3 h2 55 7.315 h 2
24 N 21.6 h2 55 11.880 h 2
25 J 18.1 h3 30 5.430 h 3
26 K 29.0 h3 40 11.600 h 3
27 P 13.3 h3 20 2.660 h 3
28 N 21.6 h3 20 4.320 h 3
29 J 18.1 h4 5 0.905 h 4
30 K 29.0 h4 5 1.450 h 4
31 P 13.3 h4 15 1.995 h 4
32 N 21.6 h4 15 3.240 h 4
Then to plot your tidied data, you want the letters across the x axis, and the rawvalue we just calculated with amount*proportion on the y axis. We stack the geom_col up from 1 to 2 or 1 to 4 (the reverse=T argument overrides the default, which would have 2 or 4 at the bottom of the stack). alpha and fill let us distinguish between groups in the same bar and between bars.
Then the geom_text labels each stacked segment with the name, a newline, and the original percentage, centered on each segment. The scale reverses the default behavior again, making 1 the darkest and 2 or 4 the lightest in each bar. Then you facet across, making one group of bars for each period.
ggplot(reserves.tidied,
aes(x = lett1, y = rawvalue, alpha = num, fill = lett1)) +
geom_col(position = position_stack(reverse = T), colour = "black") +
geom_text(position = position_stack(reverse = T, vjust = .5),
aes(label = paste0(lett, ":\n", prop, "%")), alpha = 1) +
scale_alpha_discrete(range = c(1, .1)) +
facet_grid(~period) +
guides(fill = F, alpha = F)
Rearranging it so that the "h" bars are different from the "a" and "b" bars is a bit more complex, and you'd have to think about how you want it presented, but it's totally doable.
I am trying to create a slopegraph with ggplot and geom_line. I want the lines of a subset of data (e.g. those higher then 0.5) to be in red and those less than 0.5 to be another color. Here's my code:
library(ggplot2)
library(reshape2)
mydata <- read.csv("testset.csv")
mydatam = melt(mydata)
line plot:
ggplot(mydatam, aes(factor(variable), value, group = Gene, label = Gene)) +
geom_line(col='red')
in this case, all the lines are red. how do I make red lines for those "Gene"s that have a variable low value > 0.5 (there are 5 of them, aa,ac, ba, bc and bd) and the rest black lines?
mydatam looks like this:
Gene variable value
1 aa Control 0.0
2 ab Control 0.0
3 ac Control 0.0
4 ad Control 0.0
5 ba Control 0.0
6 bb Control 0.0
7 bc Control 0.0
8 bd Control 0.0
9 aa Low 0.6
10 ab Low 0.2
11 ac Low 0.8
12 ad Low 0.1
13 ba Low 0.7
14 bb Low 0.3
15 bc Low 0.8
16 bd Low 1.2
17 aa High -0.6
18 ab High 1.6
19 ac High 2.1
20 ad High 0.7
21 ba High -1.2
22 bb High -0.7
23 bc High -0.8
24 bd High 0.6
You'll probably want to create a new variable in the data for this. Here's one way:
## Load dplyr package for data manipulation
library("dplyr")
## Genes where "Low" value is >0.5
genes <- mydatam[mydatam$variable == "Low" & mydatam$value > 0.5, "Gene"]
## Add new column
newdat <- mutate(mydatam, newval = ifelse(Gene %in% genes, ">0.5", "<=0.5"))
Now we can create the plot using newval to set the color.
## Color lines based on `newval` column
ggplot(newdat, aes(factor(variable), value, group = Gene, label = Gene)) +
geom_line(aes(color = newval)) +
scale_color_manual(values = c("#000000", "#FF0000"))
I'm very new to R and I'm trying to build a scatter plot that codes my data according to shape, colour and fill.I want 5 different colours, 3 different shapes, and these to be either filled or not filled (in an non filled point, I would still want the shape and the colour).
My data looks basically like this:
blank.test <- read.table(header=T, text="Colour Shape Fill X13C X15N
1 B B A 16 10
2 D A A 16 12
3 E A B 17 14
4 C A A 14 18
5 A A B 13 18
6 C B B 18 13
7 E C B 10 12
8 E A B 11 10
9 A C B 14 13
10 B A A 11 14
11 C B A 11 10
12 E B A 11 19
13 A B A 10 18
14 A C B 17 16
15 E B A 16 13
16 A C A 16 14")
If I do this:
ggplot(blank.test, aes(x=X13C, y=X15N,size=5)) +
geom_point(aes(shape=Shape,fill=Fill,color=Colour))
I get no filled or unfilled data points
I did a little a little research and it looked like the problem was with the symbols themselves, which cannot take different settings for line and fill; it was recommended I used shapes pch between 21 and 25
But if I do this:
ggplot(blank.test, aes(x=X13C, y=X15N,color=(Colour), shape=(Shape),fill=(Fill),size=5)) +
geom_point() + scale_shape_manual(values=c(21,22,25))`
I still don't get what I want
I also tried playing around with scale_fill_manual without any good result.
I don't think you can use fill for points. What I would do is create an interaction between fill and shape and use this new factor to define your shape and fill/open symbols
blank.test$inter <- with(blank.test, interaction(Shape, Fill))
and then for your plot I would use something like that
ggplot(blank.test, aes(x=X13C, y=X15N)) +
geom_point(aes(shape=inter,color=Colour)) + scale_shape_manual(name="shape", values=c(0,15,1, 16, 2, 17)) + scale_color_manual(name="colour", values=c("red","blue","yellow", "green", "purple"))
I can get the plot to work just fine, but the legend seems to absolutely insist on being black for fill. I can't figure out why. Maybe someone else has the answer to that one.
The 5 being on the legend is cause by having it inside the aes, where only elements that change with your data belong.
Here is some example code:
ggplot(blank.test, aes(x = X13C, y = X15N, color = Colour, shape = Shape, fill = Fill)) +
geom_point(size = 5, stroke = 3) +
scale_shape_manual(values=c(21,22,25)) +
scale_color_brewer(palette = "Set2") +
scale_fill_brewer(palette = "Set1") +
theme_bw()
How can I overlap two time series with ggplot2 and keep both X labels (one with 1970 and another with 1980)?
This is an overview of my datasets and the code I use to plot each graphic.
> dataset1.data
Date Obs
1 1/1/1970 2.0
2 1/2/1970 1.0
3 1/3/1970 0.0
4 1/4/1970 0.0
5 1/5/1970 0.5
6 1/6/1970 5.1
7 1/7/1970 0.0
8 1/8/1970 0.0
> dataset2.data
Date Obs
1 1/1/1980 3.0
2 1/2/1980 0.5
3 1/3/1980 0.5
4 1/4/1980 5.0
5 1/5/1980 0.4
6 1/6/1980 6.2
7 1/7/1980 9.0
8 1/8/1980 1.3
qplot(main="Observations 1")+xlab("Date")+ylab("Obs")+
geom_point(data = dataset1.data,aes(Date, Obs, colour="blue"),alpha = 0.7,na.rm = TRUE)+
scale_colour_identity("Legend", breaks=c("blue"), labels="1970")
qplot(main="Observations 2")+xlab("Date")+ylab("Obs")+
geom_point(data = dataset2.data,aes(Date, Obs, colour="red"),alpha = 0.7,na.rm = TRUE)+
scale_colour_identity("Legend", breaks=c("red"), labels="1980")
I would put them both in a single dataset, and then use a new Year variable for the color aesthetic:
dataset1.data = read.table('dataset1.txt')
dataset2.data = read.table('dataset2.txt')
dataset1.data$Date = as.Date(dataset1.data$Date, format='%m/%d/%Y')
dataset2.data$Date = as.Date(dataset2.data$Date, format='%m/%d/%Y')
data = rbind(dataset1.data, dataset2.data)
data = transform(data, MonthDay=gsub('(.+)-(.+-.+)', '\\2', data$Date), Year=gsub('(.+)-(.+-.+)', '\\1', data$Date))
qplot(main="Observations 1")+xlab("Date")+ylab("Obs")+geom_point(data = data,aes(MonthDay, Obs, colour=Year),alpha = 0.7,na.rm = TRUE)
It's probably also possible to do it by editing the grid objects. For example, see: https://github.com/hadley/ggplot2/wiki/Editing-raw-grid-objects-from-a-ggplot