Subsetting ggplot2 graph using facet_grid() - r

I am trying to get individual trajectories and fitted trajectory per group across repeated measurements.
Toy data below:
set.seed(124)
ID <- factor(rep(1:21, times = 3))
Group <- rep(c("A", "B", "C"), times = 21)
score <- rnorm(63, 25, 3)
session <- rep(c("s1","s2", "s3"), each = 21)
df <- data.frame(ID, Group, session, score)
Now plot trajectories across the three repeated measures for each individual and derive a fitted slope for the whole sample.
c <- ggplot(df, aes(x = session, y = score, group = ID, colour = ID)) +
geom_smooth(method = "lm", se = FALSE) +
stat_smooth(aes(group = 1), se = FALSE, method = "lm", color = "red")
c
Now I want to break this plot up into three plots by group. There is the long way where you subset the dataframe by group and do three separate graphs, However I would like to do it all in one graph, same as above, except separated by group. I tried:
c + facet_grid(.~Group)
But it comes out blank. Something is missing here and I don't know what it is.

Related

How to specify unique geom assignments to facets?

Below I have simulated a dataset where an assignment was given to 5 groups of individuals on 5 different days (a new group with 200 new individuals each day). TrialStartDate denotes the date on which the assignment was given to each individual (ID), and TrialEndDate denotes when each individual finished the assignment.
set.seed(123)
data <-
data.frame(
TrialStartDate = rep(c(sample(seq(as.Date('2019/02/01'), as.Date('2019/02/15'), by="day"), 5)), each = 200),
TrialFinishDate = sample(seq(as.Date('2019/02/01'), as.Date('2019/02/15'), by = "day"), 1000,replace = T),
ID = seq(1,1000, 1)
)
I am interested in comparing how long individuals took to complete the trial depending on when they started the trial (i.e., assuming TrialStartDate has an effect on the length of time it takes to complete the trial).
To visualize this, I want to make a barplot showing counts of IDs on each TrialFinishDate where bars are colored by TrialStartDate (since each TrialStartDate acts as a grouping variable). The best I have come up with so far is by faceting like this:
data%>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
facet_wrap(~TrialStartDate, ncol = 1)
However, I also want to add a vertical line to each facet showing when the TrialStartDate was for each group (preferably colored the same as the bars). When attempting to add vertical lines with geom_vline, it adds all the lines to each facet:
data%>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
geom_vline(xintercept = unique(data$TrialStartDate))+
facet_wrap(~TrialStartDate, ncol = 1)
How can we make the vertical lines unique to the respective group in each facet?
You're specifying xintercept outside of aes, so the faceting is not respected.
This should do the trick:
data %>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
geom_vline(aes(xintercept = TrialStartDate))+
facet_wrap(~TrialStartDate, ncol = 1)
Note geom_vline(aes(xintercept = TrialStartDate))

R ggplot2 multiple boxplots stat

I have a plot, similar to the one in the picture (taken from here):
library(ggplot2)
# create fake dataset with additional attributes - sex, sample, and temperature
x <- data.frame(
values = c(runif(100, min = -2), runif(100), runif(100, max = 2), runif(100)),
sex = rep(c('M', 'F'), each = 100),
sample = rep(c('sample_a', 'sample_b'), each = 200),
temperature = sample(c('15C', '25C', '30C', '42C'), 400, replace = TRUE)
)
# compare different sample populations across various temperatures
ggplot(x, aes(x = sample, y = values, fill = sex)) +
geom_boxplot() +
facet_wrap(~ temperature)
I want that for each sample (sample_a/b), there would be a statistical comparison (wilcoxon) between the F and M groups against an additional expected data.
I've tried adding the expected data as another boxplot next to F & M samples, or as points over the data - but for none of these options I succeeded in figuring how to do the statistical analysis using ggplot2 stat layers.

Plotting multiple box plots as a single graph in R

I am trying to plot multiple box plots as a single graph. The data is where I have done a wilcoxon test. It should be like this
I have four/five questions and I want to plot the respondent score for two sets as a box plot. This should be done for all questions (Two groups for each question).
I am thinking of using ggplot2. My data is like
q1o <- c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4)
q1s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4)
q2o <- c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4)
q2s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4)
....
....
q1 means question 1 and q2 means question 2. I also want to know how to align these stacked box plots based on my need. Like one row or two rows.
This should get you started:
Unfortunately you don't provide a minimal example with sample data, so I will generate some random sample data.
# Generate sample data
set.seed(2017);
df <- cbind.data.frame(
value = rnorm(1000),
Label = sample(c("Good", "Bad"), 1000, replace = T),
variable = sample(paste0("F", 5:11), 1000, replace = T));
# ggplot
library(tidyverse);
df %>%
mutate(variable = factor(variable, levels = paste0("F", 5:11))) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position=position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
You can specify the number of columns and rows in your 2d panel layout through arguments ncol and nrow, respectively, of facet_wrap. Many more details and examples can be found if you follow ?geom_boxplot and ?facet_wrap.
Update 1
A boxplot based on your sample data doesn't make too much sense, because your data are not continuous. But ignoring that, you could do the following:
df <- data.frame(
q1o = c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4),
q1s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4),
q2o = c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4),
q2s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4));
df %>%
gather(key, value, 1:4) %>%
mutate(
variable = ifelse(grepl("q1", key), "F1", "F2"),
Label = ifelse(grepl("o$", key), "Bad", "Good")) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position = position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
Update 2
One way of visualising discrete data would be in a mosaicplot.
mosaicplot(table(df2));
The plot shows the count of value (as filled rectangles) per Variable per Label. See ?mosaicplot for details.

Generate heatmap in R (multiple independent variable)

There are a few similar questions but they are not asking what I am looking for.
I have a gene expression data with multiple independent variables. I want to visualize it using a heatmap in R. I am not able to include all the three variables together on the heatmap. Below is the example code:
species <- rep(c("st", "rt"), each = 18)
life <- rep(c("5d", "15d", "45d"), 2, each = 6)
concentration <- rep(c("c1", "c2", "c3"), 6, each = 2)
gene <- rep(c("gene1", "gene2"), 36, each = 1)
response <- runif(36, -4, 4)
data1 <- data.frame(species, life, concentration, gene, response)
I am open to use any package. Please see below image which is from a different dataset. I wish to visualize my data in a similar manner.
example_data_visualized
Many thanks in advance!
I am not sure which of the variables in your code correspond to which of the dimensions in your chart but, using the ggplot2 package, it's quite easy to do it:
library(ggplot2)
ggplot(data1, aes(x = factor(life, levels = c("5d", "15d", "45d")),
y = concentration,
fill = response)) +
geom_tile() +
facet_wrap(~species + gene, nrow = 1) +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
scale_x_discrete(name = "life")
Of course, you can adjust the titles, labels, colours etc accordingly.

Plotting regressions from slope and intercept (lattice or ggplot2)

I have a microarray dataset on which I performed a limma lmFit() test. If you haven't heard of it before, it's a powerful linear model package that tests differential gene expressions for >20k genes. You can extract the slope and intercept from the model for each one of these genes.
My problem is: given a table of slope and intercept values, how do I match a plot (I don't mind either ggplot2's geom_abline, lattice's panel.abline, or an alternative if necessary) with its corresponding slope and intercept?
My table (call it "slopeInt") has intercept as column 1 and slope as column 2, and has row names that correspond to the name of the gene. Their names look like this:
"202586_at" "202769_at" "203201_at" "214970_s_at" "219155_at"
These names match my gene names in another table ("Data") containing some details about my samples (I have 24 samples with different IDs and Time/Treatment combination) and the gene expression values.
It's in the long format with the gene names (as above) repeating every 24 rows (different expression levels for same gene, for each one of my samples):
ID Time Treatment Gene_name Gene_exp
... ... ... ... ...
I have overall eight genes I'm interested to plot, and the names in my Data$Gene_name match the row names of my slopeInt table. I can also merge the two tables together, that's not a problem. But I tried the following two approaches to give me graphs with graphs for every one of my genes with the appropriate regression, to no avail:
Using ggplot2:
ggplot(Data, aes(x = Time, y = Gene_exp, group = Time, color = Treatment)) +
facet_wrap(~ Gene_name, scales = "free_x") +
geom_point() +
geom_abline(intercept = Intercept, slope = Time), data = slopeInt) +
theme(panel.grid.major.y = element_blank())`
And also using Lattice:
xyplot(Gene_exp ~ Time| Gene_name, Data,
jitter.data = T,
panel = function(...){
panel.xyplot(...)
panel.abline(a = slopeInt[,1], b = slopeInt[,2])},
layout = c(4, 2))
I've tried multiple other methods in the actual geom_abline() and panel.abline() arguments, including some for loops, but I am not experienced in R and I cannot get it to work.. I can also have the data file in a wide format (separate columns for each gene).
Any help and further directions will be greatly appreciated!!!
Here is some code for a reproducible example:
Data <- data.frame(
ID = rep(1:24, 8),
Time = (rep(rep(c(1, 2, 4, 24), each = 3), 8)),
Treatment = rep(rep(c("control", "smoking"), each = 12), 8),
Gene_name = rep(c("202586_at", "202769_at", "203201_at", "214970_s_at",
"219155_at", "220165_at", "224483_s_at", "227559_at"), each = 24),
Gene_exp = rnorm(192))
slopeInt <- data.frame(
Intercept = rnorm(8),
Slope = rnorm(8))
row.names(slopeInt) <- c("202586_at", "202769_at", "203201_at",
"214970_s_at", "219155_at", "220165_at", "224483_s_at", "227559_at")
With lattice, this should work
xyplot(Gene_exp ~ Time| Gene_name, Data, slopeInt=slopeInt,
jitter.data = T,
panel = function(..., slopeInt){
panel.xyplot(...)
grp <- trellis.last.object()$condlevels[[1]][which.packet()]
panel.abline(a = slopeInt[grp,1], b = slopeInt[grp,2])
},
layout = c(4, 2)
)
using set.seed(15) before generating the sample data results in the following plot
The "trick" here is to use trellis.last.object()$condlevels to determine which conditioning block we are currently in. Then we use that information to extract the right slope information from the additional data we now pass in via a parameter. I thought there was a more elegant way to determine the current values of the conditioning variables but if there is I cannot remember it at this time.
If you specify Gene_name as a column in slopeInt, then it works [as I understand you want it to]. Note also a few other changes to the ggplot call.
slopeInt$Gene_name <- rownames(slopeInt)
ggplot(Data, aes(x = Time, y = Gene_exp, color = Treatment)) +
facet_wrap(~ Gene_name, scales = "free_x") +
geom_point() +
geom_abline(aes(intercept = Intercept, slope = Slope), data = slopeInt) +
theme(panel.grid.major.y = element_blank())

Resources