Time series plot with groups using ggplot2 - r

I have an experiment where three evolving populations of yeast have been studied over time. At discrete time points, we measured their growth, which is the response variable. I basically want to plot the growth of yeast as a time series, using boxplots to summarise the measurements taken at each point, and plotting each of the three populations separately. Basically, something that looks like this (as a newbie, I don't get to post actual images, so x,y,z refer to the three replicates):
| xyz
| x z xyz
| y xyz
| xyz y
| x z
|
-----------------------
t0 t1 t2
How can this be done using ggplot2? I have a feeling that there must be a simple and elegant solution, but I can't find it.

Try this code:
require(ggplot2)
df <- data.frame(
time = rep(seq(Sys.Date(), len = 3, by = "1 day"), 10),
y = rep(1:3, 10, each = 3) + rnorm(30),
group = rep(c("x", "y", "z"), 10, each = 3)
)
df$time <- factor(format(df$time, format = "%Y-%m-%d"))
p <- ggplot(df, aes(x = time, y = y, fill = group)) + geom_boxplot()
print(p)
Only with x = factor(time), ggplot(df, aes(x = factor(time), y = y, fill = group)) + geom_boxplot() + scale_x_date(), was not working.
Pre-processing, factor(format(df$time, format = "%Y-%m-%d")), was required for this form of graphics.

Related

Plot multiple measurements

I have a data set with 12 individuals measured at 25 time points. I want a graph, which lines ordered by individual (1-12) and measurment (A, B, or C) and the timepoints on the x axis and the value on the y.
The cols of my dataset look like this (so it is already in long format):
Individuum (1 x 25; 2 x 25...) / Measurment (A B or C) / timepoint (1 - 25, 1- 25,...) / value
I already tried this:
ggplot(data = Replicate1, mapping = aes(x = Reading, y = value, linetype = Group))
but there are no lines showed and I dont know how to add the measurement.
You may do something like this. Showing you on a sample data.
set.seed(12)
df <- data.frame(individual = rep(1:12, each =3),
obs = LETTERS[1:3],
time = rep(1:25, each = 36),
val = sample(25:100, 900, T))
library(tidyverse)
df %>%
ggplot(aes(x= time, y = val, group = individual, color = as.factor(individual))) +
geom_line() +
facet_wrap(. ~ obs, ncol = 1)
Created on 2021-07-03 by the reprex package (v2.0.0)

Creating and labelling points on a geom_line() graph [duplicate]

This question already has an answer here:
label specific point in ggplot2
(1 answer)
Closed 1 year ago.
Basically, I want to make a graph that shows where different "analysts" chose a certain point on the graph.
This is what the base graph looks like
.
This is what I want to produce
.
I have a separate dataframe called sum_data that summarizes the time choices made by each analyst. It looks like this. The following is the code used to create the plot:
gqplot <- ggplot(Qdata,
aes(x = date,
y = cfs))+
labs(#title = paste(watershedID,"_",event),
x = "Date",
y = "Flow [cfs]")+
geom_line(colour = "#000099")+
# Show plot
gqplot
Hey what you need is a data.frame that contains the choices of those 2 people (like your sum_data) and then use geom_point().
Here is an example. I made up data and code etc because you didn't provide a completely reproducible example.
# Library
library(ggplot2)
# Seed for exact reproducibility
set.seed(20210307)
# Main data frame
main_data <- data.frame(x = 1:10, y = rnorm(10))
# Analyst data frame
analyst_choice <- data.frame(x = c(2, 3), y = main_data[2:3, 'y'], analyst = c('John', 'Paul'))
# Create plot
ggplot(main_data, aes(x = x, y = y)) +
geom_line() +
geom_point(data = analyst_choice, aes(x = x, y = y, colour = analyst), size = 10, shape = 4)
That's what this code produces:

R ggplot2 multiple boxplots stat

I have a plot, similar to the one in the picture (taken from here):
library(ggplot2)
# create fake dataset with additional attributes - sex, sample, and temperature
x <- data.frame(
values = c(runif(100, min = -2), runif(100), runif(100, max = 2), runif(100)),
sex = rep(c('M', 'F'), each = 100),
sample = rep(c('sample_a', 'sample_b'), each = 200),
temperature = sample(c('15C', '25C', '30C', '42C'), 400, replace = TRUE)
)
# compare different sample populations across various temperatures
ggplot(x, aes(x = sample, y = values, fill = sex)) +
geom_boxplot() +
facet_wrap(~ temperature)
I want that for each sample (sample_a/b), there would be a statistical comparison (wilcoxon) between the F and M groups against an additional expected data.
I've tried adding the expected data as another boxplot next to F & M samples, or as points over the data - but for none of these options I succeeded in figuring how to do the statistical analysis using ggplot2 stat layers.

ggplot facet_wrap with italics

I have a dataset I'm plotting, with facets by variables (in the toy dataset - densities of 2 species). I need to use the actual variable names to do 2 things: 1) italicize species names, and 2) have the 2 in n/m2 properly superscripted (or ASCII-ed, whichever easier).
It's similar to this, but I can't seem to make it work for my case.
toy data
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10,
z = rep(c("Species1 density (n/m2)", "Species2 density (m/m2)"), each = 5),
z1 = rep(c("Area1", "Area2", "Area3", "Area4", "Area5"), each = 2))
ggplot(df) + geom_point(aes(x = x, y = y)) + facet_grid(z1 ~ z)
I get an error (variable z not found) when I try to use the code in the answer naively. How do I get around having 2 variables in the facetting?
A little modification gets the code from your link to work. I've changed the code to use data_frame to stop the character vector being converted to a factor, and taken the common information out of the codes so it can be added via the labeller (otherwise it would be a pain to make half the text italic)
library(tidyverse)
df <- data_frame(
x = 1:10,
y = 1:10,
z = rep(c("Species1", "Species2"), each = 5),
z1 = rep(c("Area1", "Area2", "Area3", "Area4", "Area5"), each = 2)
)
ggplot(df) +
geom_point(aes(x = x, y = y)) +
facet_grid(z1 ~ z, labeller = label_bquote(col = italic(.(z))~density~m^2))

facet_grid() causing crash

I can not figure out what I'm missing. I keep crashing r or causing it to give very weird plots.
> head(vData)
vix.Close vstoxx vxfxi.Close Date
2011-03-16 29.40 35.2293 35.84 2011-03-16
2011-03-17 26.37 30.6133 31.77 2011-03-17
2011-03-18 24.44 28.5337 29.31 2011-03-18
2011-03-21 20.61 25.2355 25.95 2011-03-21
2011-03-22 20.21 24.3914 24.52 2011-03-22
2011-03-23 19.17 23.9226 24.03 2011-03-23
The below works:
p1.1<-ggplot(data = vData, aes(x = Date, y = vix.Close)) + geom_line(col= "red")
p1.1
p2<-p1.1 + geom_line(data = vData[!is.na(vData$vstoxx),], aes(x = Date, y = vstoxx), col="blue")
p2
p3<-p2 + geom_line(data = vData[!is.na(vData$vxfxi.Close),], aes(x = Date, y = vxfxi.Close), col="green")
p3
p4<-p3 + labs(title = "Volatility Indexes", x = "Time", y = "Index")
p4
But this is the part that is giving me trouble:
p5<- p4 + facet_grid(Date~., scales = Date)
p5
I echo what baptiste said: what is it you're trying to do? The code you've provided suggests that you're trying to create a separate line chart for each date in the dataset, which doesn't make much sense. For this demonstration, I'll show you how to facet the data by year to see the correlations between the different measurements of volatility over time. If you provide more detail as a comment, I'll revisit the code.
First let's take a look at what you've already done.
library(tidyverse)
library(gridExtra)
library(lubridate)
library(reshape2)
#Generate dummy data
vData <- tibble(
vix.Close = rnorm(1000, mean = 12, sd = 5),
vstoxx = rnorm(1000, mean = 12, sd = 5),
vxfxi.Close = rnorm(1000, mean = 12, sd = 5),
Date = as.Date(1:1000, origin = '2011-01-01')
)
# Generate individual plots per your question
p1.1 <-
ggplot(data = vData, aes(x = Date, y = vix.Close)) + geom_line(col = "red")
p1.1
p2 <-
p1.1 + geom_line(data = vData[!is.na(vData$vstoxx), ], aes(x = Date, y = vstoxx), col =
"blue")
p2
p3 <-
p2 + geom_line(data = vData[!is.na(vData$vxfxi.Close), ], aes(x = Date, y = vxfxi.Close), col =
"green")
p3
p4 <-
p3 + labs(title = "Volatility Indexes", x = "Time", y = "Index")
p4
You're creating four different plots and then layering them on top of each other. This approach works here, but it's cumbersome to make changes to each of the calls to ggplot or if you want to add/remove variables. Let's move your data to a "long" format and simplify the ggplot call.
# Melt the data into three columns and remove NAs
vData <- melt(vData, id = "Date") %>%
filter(!is.na(value)) %>%
tbl_df()
# Create one ggplot for all three indexes
ggplot(data = vData, aes(x = Date, y = value, color = variable)) +
geom_line() +
labs(title = "Volatility Indexes", x = "Time", y = "Index")
Now back to the big problem: you shouldn't be faceting by date because that would give you a huge number of tiny unreadable line charts. There are a number of other facets that might make sense. For example, you could look at the distribution of the three indexes by year.
ggplot(data = vData, aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
labs(title = "Volatility Indexes", x = "", y = "") +
facet_grid(year(Date) ~ .)
So put some thought into what exactly you want to show.

Resources