How to create separate facets for different measurements with tidyverse? - r

I am a novice programmer looking to plot highly grouped variables. Specifically, I am trying to plot a variable that is grouped by 5 other variables. Below is an example data that I am working with.
library(ggplot2)
library(tibble)
set.seed(42)
mydf <- tibble(
grp = rep(c('A', 'B'), length.out = 32, each = 16),
sex = rep(c('M', 'F'), length.out = 32, each = 8),
cond = rep(c('Wet', 'Dry'), length.out = 32, each = 4),
measure = rep(c('Tempature', 'Volume'), length.out = 32, each = 2),
kind = rep(c('Experimental', 'Control'), length.out = 32, each = 1),
value = rnorm(32) * 100,
)
ggplot(mydf, aes(x = grp, y = value, col = cond)) +
geom_point() +
facet_wrap(sex~measure + kind)
However, the output is quite messy. Would it be possible to create separate faceted plots for each measurement? What would be a proper way to graph this type of data?
Thank you

For ease of comparison, I would facet on no more than two variables. I would also use facet_grid() rather than facet_wrap() in such cases, as I think it's just easier to keep track of the different facet dimensions if they are on separate axes.
In your case, you want to distinguish measurements for 5 binary variables.
grp
sex
cond
measure
kind
With "grp" on the x-axis, "sex" distinguished by colour, and 2 of the remaining 3 on facets, we'll need to introduce another aesthetic parameter to distinguish the last variable.
In this case, since there aren't too many points to plot, I suggest shape.
ggplot(mydf, aes(x = grp, y = value,
color = cond,
shape = kind)) +
geom_point(size = 5, stroke = 2) +
facet_grid(sex~measure) +
scale_shape_manual(values = c("Control" = 4, "Experimental" = 16),
breaks = c("Experimental", "Control"))
The use of a filled shape vs an un-filled shape makes Experimental points visually distinct from Control points. You can check out other shape options here.
Note that if there are many different values in your grouping variables (e.g. 5 categories along the x-axis, 6 different colours, 20 facet combinations, etc.), or many points within each facet, the plot will look very busy, and you may want to split into separate plots rather than keep everything together.

Related

Setting per-column y axis limits with facet_grid

I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2

How to specify unique geom assignments to facets?

Below I have simulated a dataset where an assignment was given to 5 groups of individuals on 5 different days (a new group with 200 new individuals each day). TrialStartDate denotes the date on which the assignment was given to each individual (ID), and TrialEndDate denotes when each individual finished the assignment.
set.seed(123)
data <-
data.frame(
TrialStartDate = rep(c(sample(seq(as.Date('2019/02/01'), as.Date('2019/02/15'), by="day"), 5)), each = 200),
TrialFinishDate = sample(seq(as.Date('2019/02/01'), as.Date('2019/02/15'), by = "day"), 1000,replace = T),
ID = seq(1,1000, 1)
)
I am interested in comparing how long individuals took to complete the trial depending on when they started the trial (i.e., assuming TrialStartDate has an effect on the length of time it takes to complete the trial).
To visualize this, I want to make a barplot showing counts of IDs on each TrialFinishDate where bars are colored by TrialStartDate (since each TrialStartDate acts as a grouping variable). The best I have come up with so far is by faceting like this:
data%>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
facet_wrap(~TrialStartDate, ncol = 1)
However, I also want to add a vertical line to each facet showing when the TrialStartDate was for each group (preferably colored the same as the bars). When attempting to add vertical lines with geom_vline, it adds all the lines to each facet:
data%>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
geom_vline(xintercept = unique(data$TrialStartDate))+
facet_wrap(~TrialStartDate, ncol = 1)
How can we make the vertical lines unique to the respective group in each facet?
You're specifying xintercept outside of aes, so the faceting is not respected.
This should do the trick:
data %>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
geom_vline(aes(xintercept = TrialStartDate))+
facet_wrap(~TrialStartDate, ncol = 1)
Note geom_vline(aes(xintercept = TrialStartDate))

Nested legend based on colour and shape

I want to make an xy plot of nested groups (Group and Subgroup) where points are colored by Group and have shape by Subgroup. A minimal example is below:
DATA<-data.frame(
Group=c(rep("group1",10),rep("group2",10),rep("group3",10) ),
Subgroup = c(rep(c("1.1","1.2"),5), rep(c("2.1","2.2"),5), rep(c("3.1","3.2"),5)),
x=c(rnorm(10, mean=5),rnorm(10, mean=10),rnorm(10, mean=15)),
y=c(rnorm(10, mean=3),rnorm(10, mean=4),rnorm(10, mean=5))
)
ggplot(DATA, aes(x=x, y=y,colour=Group, shape=Subgroup) ) +
geom_point(size=3)
However, because in reality I have many more subgroups than can be easily be identified based on the available shapes I want to repeat the same shapes within each Group. Below is the same code but with an additional column (Shape) specifying the shape:
DATA<-data.frame(
Group=c(rep("group1",10),rep("group2",10),rep("group3",10) ),
Subgroup = c(rep(c("1.1","1.2"),5), rep(c("2.1","2.2"),5), rep(c("3.1","3.2"),5)),
Shape = as.character(c(rep(c(1,2),15) ) ),
x=c(rnorm(10, mean=5),rnorm(10, mean=10),rnorm(10, mean=15)),
y=c(rnorm(10, mean=3),rnorm(10, mean=4),rnorm(10, mean=5))
)
ggplot(DATA, aes(x=x, y=y,colour=Group, shape=Shape) ) +
geom_point(size=3)
Now the shapes and colours are as I want them. However, the legend no longer lists the subgroups. What I want is a legend that lists all subgroups under each respective Group. Something like:
Group1
1.1
1.2
Group2
2.1
2.2
Group3
3.1
3.2
(Ideally, this would be a single nested legend. If nested legends are not possible, perhaps they can be three separate legends with the Groups as titles)
Is this something that can be achieved, and how?
Thanks
One option to achieve your desired result would be via the ggnewscale package which allows for multiple scales and legends for the same aesthetic.
To this end we have to
split the data by GROUP and plot each GROUP via a separate geom_point layer.
Additionally each GROUP gets a separate shape scale and legend which via achieve via ggnewscale::new_scale.
Instead of making use of the color aesthetic we set the color for each group as an argument for which I make use of a named vector of colors
Instead of copying and pasting the code for each group I make use of purrr::imap to loop over the splitted dataset and add the layers dynamically.
One more note: In general the order of legends is by default set via a "magic algorithm". To get the groups in the right order we have to explicitly set the order via guide_legend.
library(ggplot2)
library(ggnewscale)
library(dplyr)
library(purrr)
library(tibble)
DATA_split <- split(DATA, DATA$Group)
# Vector of colors and shapes
colors <- setNames(scales::hue_pal()(length(DATA_split)), names(DATA_split))
shapes <- setNames(scales::shape_pal()(length(unique(DATA$Shape))), unique(DATA$Shape))
ggplot(mapping = aes(x = x, y = y)) +
purrr::imap(DATA_split, function(x, y) {
# Get Labels
labels <- x[c("Shape", "Subgroup")] %>%
distinct(Shape, Subgroup) %>%
deframe()
# Get order
order <- as.numeric(gsub("^.*?(\\d+)$", "\\1", y))
list(
geom_point(data = x, aes(shape = Shape), color = colors[[y]], size = 3),
scale_shape_manual(values = shapes, labels = labels, name = y, guide = guide_legend(order = order)),
new_scale("shape")
)
})
DATA
set.seed(123)
DATA <- data.frame(
Group = c(rep("group1", 10), rep("group2", 10), rep("group3", 10)),
Subgroup = c(rep(c("1.1", "1.2"), 5), rep(c("2.1", "2.2"), 5), rep(c("3.1", "3.2"), 5)),
Shape = as.character(c(rep(c(1, 2), 15))),
x = c(rnorm(10, mean = 5), rnorm(10, mean = 10), rnorm(10, mean = 15)),
y = c(rnorm(10, mean = 3), rnorm(10, mean = 4), rnorm(10, mean = 5))
)

Generate heatmap in R (multiple independent variable)

There are a few similar questions but they are not asking what I am looking for.
I have a gene expression data with multiple independent variables. I want to visualize it using a heatmap in R. I am not able to include all the three variables together on the heatmap. Below is the example code:
species <- rep(c("st", "rt"), each = 18)
life <- rep(c("5d", "15d", "45d"), 2, each = 6)
concentration <- rep(c("c1", "c2", "c3"), 6, each = 2)
gene <- rep(c("gene1", "gene2"), 36, each = 1)
response <- runif(36, -4, 4)
data1 <- data.frame(species, life, concentration, gene, response)
I am open to use any package. Please see below image which is from a different dataset. I wish to visualize my data in a similar manner.
example_data_visualized
Many thanks in advance!
I am not sure which of the variables in your code correspond to which of the dimensions in your chart but, using the ggplot2 package, it's quite easy to do it:
library(ggplot2)
ggplot(data1, aes(x = factor(life, levels = c("5d", "15d", "45d")),
y = concentration,
fill = response)) +
geom_tile() +
facet_wrap(~species + gene, nrow = 1) +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
scale_x_discrete(name = "life")
Of course, you can adjust the titles, labels, colours etc accordingly.

Adding lines to grouped boxplots

I have a dataset with 3 factors (Parent.organization, Hierarchy, variable) as well as a metric variable (value) and could use some help. Here is some sample data of the same style:
sampleData <- data.frame(id = 1:100,
Hierarchy = sample(c("Consultant", "Registrar", "Intern", "Resident"), 100, replace = TRUE),
Parent.organization = sample(c("Metropolitan", "Regional"), 100, replace = TRUE),
variable = sample(c("CXR", "AXR", "CTPA", "CTB"), 100, replace = TRUE),
value = rlnorm(20, log(10), log(2.5)))
summary(sampleData)
Using the following code I get the graph below
library(ggplot2)
library(scales)
p0 = ggplot(sampleData, aes(x = Hierarchy, y = value, fill = variable)) +
geom_boxplot()
plog = p0 + scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) +
theme_bw() +
facet_grid(.~Parent.organization, scales = "free", space = "free")
I have a set of values I want to mark for each scan variable (these are the same across all elements of the hierarchy and represent true values). Lets say they are 3, 5, 7, 5 for AXR, CTB, CTPA, CXR respectively. I want these overlayed on top but I am unsure how to proceed.
I'm after something akin to (I've just filled the first two but the same pattern would apply across the board):
My knowledge of R is improving but I'd say I'm still fairly inept. Also any suggestions on how to improve my question are also very welcome.
First, you have to make new data frame for the lines, where you have the same grouping and facetting variables as in original data frame. All the data should be repeated for the all combinations.
true.df<-data.frame(Hierarchy =rep(rep(c("Consultant", "Registrar", "Intern", "Resident"),each=4),times=2),
Parent.organization = rep(c("Metropolitan", "Regional"),each=16),
variable = rep(c("AXR", "CTB", "CTPA", "CXR"),times=8),
true.val=rep(c(3,5,7,5),times=8))
Then you can use geom_crossbar() to add the lines. Use true.val for the y, ymin and ymax to get lines. position=position_dodge() will ensure that lines are dodged and show_guide=FALSE will ensure that legend isn't affected.
plog+geom_crossbar(data=true.df,aes(x = Hierarchy,y=true.val,ymin=true.val,
ymax=true.val,fill=variable),
show_guide=FALSE,position=position_dodge(),color="red")

Resources