I have a data set with 12 individuals measured at 25 time points. I want a graph, which lines ordered by individual (1-12) and measurment (A, B, or C) and the timepoints on the x axis and the value on the y.
The cols of my dataset look like this (so it is already in long format):
Individuum (1 x 25; 2 x 25...) / Measurment (A B or C) / timepoint (1 - 25, 1- 25,...) / value
I already tried this:
ggplot(data = Replicate1, mapping = aes(x = Reading, y = value, linetype = Group))
but there are no lines showed and I dont know how to add the measurement.
You may do something like this. Showing you on a sample data.
set.seed(12)
df <- data.frame(individual = rep(1:12, each =3),
obs = LETTERS[1:3],
time = rep(1:25, each = 36),
val = sample(25:100, 900, T))
library(tidyverse)
df %>%
ggplot(aes(x= time, y = val, group = individual, color = as.factor(individual))) +
geom_line() +
facet_wrap(. ~ obs, ncol = 1)
Created on 2021-07-03 by the reprex package (v2.0.0)
Related
Below I have simulated a dataset where an assignment was given to 5 groups of individuals on 5 different days (a new group with 200 new individuals each day). TrialStartDate denotes the date on which the assignment was given to each individual (ID), and TrialEndDate denotes when each individual finished the assignment.
set.seed(123)
data <-
data.frame(
TrialStartDate = rep(c(sample(seq(as.Date('2019/02/01'), as.Date('2019/02/15'), by="day"), 5)), each = 200),
TrialFinishDate = sample(seq(as.Date('2019/02/01'), as.Date('2019/02/15'), by = "day"), 1000,replace = T),
ID = seq(1,1000, 1)
)
I am interested in comparing how long individuals took to complete the trial depending on when they started the trial (i.e., assuming TrialStartDate has an effect on the length of time it takes to complete the trial).
To visualize this, I want to make a barplot showing counts of IDs on each TrialFinishDate where bars are colored by TrialStartDate (since each TrialStartDate acts as a grouping variable). The best I have come up with so far is by faceting like this:
data%>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
facet_wrap(~TrialStartDate, ncol = 1)
However, I also want to add a vertical line to each facet showing when the TrialStartDate was for each group (preferably colored the same as the bars). When attempting to add vertical lines with geom_vline, it adds all the lines to each facet:
data%>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
geom_vline(xintercept = unique(data$TrialStartDate))+
facet_wrap(~TrialStartDate, ncol = 1)
How can we make the vertical lines unique to the respective group in each facet?
You're specifying xintercept outside of aes, so the faceting is not respected.
This should do the trick:
data %>%
group_by(TrialStartDate, TrialFinishDate)%>%
count()%>%
ggplot(aes(x = TrialFinishDate, y = n, col = factor(TrialStartDate), fill = factor(TrialStartDate)))+
geom_bar(stat = "identity")+
geom_vline(aes(xintercept = TrialStartDate))+
facet_wrap(~TrialStartDate, ncol = 1)
Note geom_vline(aes(xintercept = TrialStartDate))
Here is a brief description of my data: The first column is date by month, the second column is binary variable (0 or 1), the third column is stock return, so each month's stock return point to 1 or 0.
I want to calculate the 12-month rolling mean return separately based on the second column (0 or 1). There will be different number of 0s and 1s in the 12-month rolling base. There should be 2 outcome (mean_rolling_0, and mean_rolling_1).
Use rollmean() from the zoo package, and apply this per group with group_by() in dplyr.
Here's an example. I'm guessing at your data structure, but it will also work for similar structures.
library(tidyverse)
library(zoo)
# sample data
d = tibble(a = 1:100,
b = sample(c(0,1), 100, replace = T),
c = a/10 + rnorm(100))
# compute rolling mean
d2 = d %>%
group_by(b) %>%
mutate(roll = rollmean(c, 12, na.pad=TRUE, align="right"))
# plot to see the effect
ggplot(data = d2) + geom_line(aes(x = a, y = c, colour = factor(b))) +
geom_line(aes(x = a, y = roll, colour = factor(b)), linetype = 'dashed')
I'm trying to generate a polar violin plot with ggplot2. I'd like to control the relative size of each category (the width of each category of the factor on the x axis, which then translates to angle once I make the coordinates polar).
Is there any way to do this?
Example code:
means <- runif(n = 10, min=0.1, max=0.6)
sds <- runif(n = 10, min=0.2, max=0.4)
frame <- data.frame(
cat = sample(1:10, size=10000, replace=TRUE),
value = rnorm(10000)
) %>%
mutate(
mn = means[cat],
sd = sds[cat],
value = (value * sd) + mn,
cat = factor(cat)
)
frame %>%
ggplot(aes(x = cat, y = value)) + geom_violin() +
coord_polar()
Any help or advice is appreciated.
Alternatively (and perhaps better), I'd like to be able to make a polar coordinates chart that isn't centered. Where the angles are the same for each discrete category, but the points converge, say, 1/3 of the way from the bottom of the circle, rather than in the center of the circle.
Based on comments, I'm redoing my previous answer. If what you want is a fan/weed leaf shape, you can add dummy data for additional cat values. In this example, I just doubled the number of levels in cat, but you could change this. Then I set the x breaks to only show the values that actually have data, but let the dummy values take up space to change the shape. Still not sure if this is what you meant but it's interesting to try.
library(tidyverse)
means <- runif(n = 10, min=0.1, max=0.6)
sds <- runif(n = 10, min=0.2, max=0.4)
frame <- data.frame(
cat = sample(1:10, size=10000, replace=TRUE),
value = rnorm(10000)
) %>%
mutate(
mn = means[cat],
sd = sds[cat],
value = (value * sd) + mn,
cat = factor(cat)
)
frame %>%
mutate(cat = as.integer(cat)) %>%
bind_rows(tibble(cat = 11:20, value = NA)) %>%
ggplot(aes(x = as.factor(cat), y = value)) +
geom_violin(scale = "area") +
coord_polar(start = -pi / 2) +
scale_x_discrete(breaks = 1:10)
#> Warning: Removed 10 rows containing non-finite values (stat_ydensity).
Created on 2018-05-08 by the reprex package (v0.2.0).
that's my df (almost 100,000 rows and 10 ID values)
Date.time P ID
1 2013-07-03 12:10:00 1114.3 J9335
2 2013-07-03 12:20:00 1114.5 K0904
3 2013-07-03 12:30:00 1114.3 K0904
4 2013-07-03 12:40:00 1114.1 K1136
5 2013-07-03 12:50:00 1114.1 K1148
............
With ggplot I create this graph:
ggplot(df) + geom_line(aes(Date.time, P, group=ID, colour=ID)
No problem with this graph. But at the moment that I have to print it also in b/w, the separation in colors is not a smart choice.
I try to group the ID with the line type but the result is not so exiting.
So my idea is to add a different symbol at the beginning and at the end of every line: so the different IDs can be identified also in a b/w paper.
I add the lines:
geom_point(data=df, aes(x=min(Date.time), y=P, shape=ID))+
geom_point(data=df, aes(x=max(Date.time), y=P, shape=ID))
But an error occur..
Any suggestions?
Given that every line is composed by around 5000 or 10000 values it's impossible to plot the values as different characters. A solution could be to plot the lines and then plot the point as different symbol for every ID divided into breaks (for example one character every 500 values). Is it possible to do that?
What about adding the geom_points using a subset of you data with only the min-max time values?
# some data
df <- data.frame(
ID = rep(c("a", "b"), each = 4),
Date.time = rep(seq(Sys.time(), by = "hour", length.out = 4), 2),
P = sample(1:10, 8))
df
# create a subset with min and max time values
# if min(x) and max(x) is the same for each ID:
df_minmax <- subset(x= df, subset = Date.time == min(Date.time) | Date.time == max(Date.time))
# if min(x) and max(x) may differ between ID,
# calculate min and max values *per* ID
# Here I use ddply, but several other aggregating functions in base R will do as well.
library(plyr)
df_minmax <- ddply(.data = df, .variables = .(ID), subset,
Date.time == min(Date.time) | Date.time == max(Date.time))
gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
geom_line(aes(group = ID, colour = ID)) +
geom_point(data = df_minmax, aes(shape = ID))
gg
If you wish to have some control over your shapes, you may have a look at ?scale_shape_discrete (with examples here).
Edit following updated question
For each ID, add a shape to the line at some interval.
# create a slightly larger data set
df <- data.frame(
ID = rep(c("a", "b"), each = 100),
Date.time = rep(seq(Sys.time(), by = "day", length.out = 100), 2),
P = c(sample(1:10, 100, replace = TRUE), sample(11:20, 100, replace = TRUE)))
# for each ID:
# create a time sequence from min(time) to max(time), by some time step
# e.g. a week
df_gap <- ddply(.data = df, .variables = .(ID), summarize,
Date.time =
seq(from = min(Date.time), to = max(Date.time), by = "week"))
# add P from df to df_gap
df_gap <- merge(x = df_gap, y = df)
gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
geom_line(aes(group = ID, colour = ID)) +
geom_point(data = df_gap, aes(shape = ID)) +
# if your gaps are not a multiple of the length of the data
# you may wish to add the max points as well
geom_point(data = df_minmax, aes(shape = ID))
gg
The error stems from the fact that the single numeric value min(Date.time) doesn't match up in length with the vectors P or ID. Another problem might be that you're re-declaring your data variable even though you already have ggplot(df).
The solution that immediately comes to mind is to figure out what the row indexes are for your minimum and maximum dates. If they all share the same minimum and maximum time stamps than its easy. Use the which() function to come up with an array of the row numbers you'll need.
min.index <- which(df$Date.time == min(df$Date.time))
max.index <- which(df$Date.time == max(df$Date.time))
Then use those arrays as your indexes.
geom_point(aes(x=Date.time[min.index], y=P[min.index], shape=ID[min.index]))+
geom_point(aes(x=Date.time[max.index], y=P[max.index], shape=ID[max.index]))
I have an experiment where three evolving populations of yeast have been studied over time. At discrete time points, we measured their growth, which is the response variable. I basically want to plot the growth of yeast as a time series, using boxplots to summarise the measurements taken at each point, and plotting each of the three populations separately. Basically, something that looks like this (as a newbie, I don't get to post actual images, so x,y,z refer to the three replicates):
| xyz
| x z xyz
| y xyz
| xyz y
| x z
|
-----------------------
t0 t1 t2
How can this be done using ggplot2? I have a feeling that there must be a simple and elegant solution, but I can't find it.
Try this code:
require(ggplot2)
df <- data.frame(
time = rep(seq(Sys.Date(), len = 3, by = "1 day"), 10),
y = rep(1:3, 10, each = 3) + rnorm(30),
group = rep(c("x", "y", "z"), 10, each = 3)
)
df$time <- factor(format(df$time, format = "%Y-%m-%d"))
p <- ggplot(df, aes(x = time, y = y, fill = group)) + geom_boxplot()
print(p)
Only with x = factor(time), ggplot(df, aes(x = factor(time), y = y, fill = group)) + geom_boxplot() + scale_x_date(), was not working.
Pre-processing, factor(format(df$time, format = "%Y-%m-%d")), was required for this form of graphics.