Plot overlapping vertical lines with ggplot - r

I have a list of time-ordered pairwise interactions. I want to plot a temporal network of these interactions, which would look something like the diagram below.
My data looks like the example below. The id1 and id2 values are the unique identifiers of individuals. The time indicates when an interaction betweens those individuals occurred. So at time = 1, I want to plot a connection between individual-1 and individual-2.
id1 <- c(1, 2, 1, 6, 2, 2, 1)
id2 <- c(2, 4, 5, 7, 3, 4, 5)
time <- c(1, 2, 2, 2, 3, 4, 5)
df <- data.frame(id1, id2, time)
According to this StackOverflow question, I can see that it is possible to draw vertical lines between positions on the y-axis in ggplot. This is achieved by reshaping the data into a long format. This is fine when there is only one pair per time value, but not when there is more than one interacting pair at a time. For example in my dummy data, at time = 2, there are three pairs (in the plot I would show these by overlaying lines with reduced opacity).
My question is, how can I organise these data in a way that ggplot will be able to plot potentially multiple interacting pairs at specified time points?
I have been trying to reorganise the data by assigning an extra identifier to each of the multiple pairs that occur at the same time. I imagined the data table to look like this, but I haven't figure out how to make this in R... In this example the three interactions at time = 2 are identified by an extra grouping of either 1, 2 or 3. Even if I could arrange this I'm still not sure how I would get ggplot to read it.
Ultimately I'm trying to create someting that looks like Fig. 2 in this scientific paper.
Any help would be appreciated!

You can do this without reshaping the data, just set one id to y and the other id to yend in geom_curve:
ggplot(df, aes(x = time, y = id1)) +
geom_curve(aes(xend = time, yend = id2), curvature = 0.3) +
geom_hline(yintercept = 1:7, colour = scales::muted("blue")) +
geom_point(size = 3) +
geom_point(aes(y = id2), size = 3) +
coord_cartesian(xlim = c(0, max(df$time) + 1)) +
theme_bw()
Output:

Libraries:
library('ggplot2')
library('data.table')
Data:
id1 <- c(1, 2, 1, 6, 2, 2, 1)
id2 <- c(2, 4, 5, 7, 3, 4, 5)
time <- c(1, 2, 2, 2, 3, 4, 5)
df <- data.frame(id1, id2, time)
setDT(df)
df1 <- melt.data.table( df, id.vars = c('time'))
Plot:
p <- ggplot( df1, aes(time, value)) +
geom_point() +
geom_curve( mapping = aes(x = time, y = id1, xend = time, yend = id2, colour = "curve"),
data = df,
curvature = 0.2 )
print(p)

Related

find time windows by conditional values in a time series

I have a very long time series. The timestamps are equidistant. For simplicity, I am leaving out many columns and rows:
library(data.table)
dt <- data.table(time = c(seq(0, 2, by = 0.1)),
value1 = c(1, 2, 1, 3, 2, 2, 1, 4, 6, 3, 2, 1, 2, 3, 2, 1, 3, 2, 3, 2, 1),
value2 = c(8, 7, 7, 6, 7, 8, 5, 6, 4, 1, 3, 2, 5, 7, 6, 8, 7, 8, 4, 1, 2))
The graph of the data look likes this:
library(ggplot2)
ggplot(data = dt, aes(x = time)) +
geom_line(aes(y = value1), color = "red", size = 2) +
geom_line(aes(y = value2), color = "blue", size = 2)
Now I want to find to find the time series windows, that fulfill specific conditions. All conditions have to be met at the same time window. In this example:
the red line (value1) must be between 1 and 3
the blue line (value2) must be between 6 and 8
the time window must have a length of AT LEAST 0.5 seconds. (for example: if there is a timeline of 2 seconds, where the value conditions are met, the timeline of 2 seconds should be returned an NOT 4 x 0.5 seconds timelines)
How am I able to implement this in R for a very long time series and multiple columns/conditions?
The goal is to find similar patterns in my data by a special set of conditions.
Consider this solution. It defines a window_id for every valid timestamp in a window. if a timestamp is not part of a valid window, window_id is NA.
# value constraints
dt$value_cons_met <- inrange(dt$value1,1,3) & inrange(dt$value2,6,8)
# assign all potential sequential true timestamps a group id
dt$potential_win_id <- c(0,cumsum(abs(diff(dt$value_cons_met))))
# is the window big enough?
dt[,window_size_okay := max(time)-min(time) >= 0.5 ,by = potential_win_id]
# Other window dependent constraints can be put here
# Window "ID" is defined if the valid cons are met and window size is okay
# in that case copy potential window number as window id
dt[,window_id := ifelse(value_cons_met & window_size_okay, potential_win_id,NA)]
# sample plot
library(ggplot2)
ggplot(data = dt, aes(x = time)) +
geom_line(aes(y = value1), color = "red", size = 2) +
geom_line(aes(y = value2), color = "blue", size = 2) +
geom_line(aes(y = window_id), color = "green", size =2)
This is an example plot where the detected window_id is the y value

How to draw a stacked barplot with three categorical variables representing the proportion of only one of them for each facet in r?

This is the data to take up as a reference
df <- data.frame(a = c(3,3,3,3,3,2,2,3,2,1,1,1,3,1,3), b = c(1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2), c = c(4, 5, 3, 2, 4, 2, 3, 4, 5, 4, 4, 3, 3, 1, 2) )
I want to draw a bargraph with the proportion of a for each facet. At the same time I want the bars to be colored according to the b value.
The variable b is not relevant for calculating the percentage. This is what I came up with, when I set the fill = c, it divides the stacked color in two, one corresponding to 1, and the other as NA.
ggplot(aes(x = a, y = ...prop..., group = 1, fill = b)) +
geom_bar(position = "stack") +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c")
how can I have a result similar to this one but with the proportions of a for each facet wrap instead of the absolute values?
Thank you!
Here's an approach using the ..count.. and ..PANEL.. special symbols:
ggplot(df, aes(x = a, fill = as.factor(b))) +
geom_bar(aes(y = ..count.. / tapply(..count..,..PANEL..,sum)[..PANEL..])) +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c", fill = "b", y = "Proportion")
If you weren't using facet_wrap this would be trivial by setting y = ..prop... However, ..prop.. is not caculated properly by facet. So, to get around this problem, we can use tapply and the ..PANEL.. special symbol to sum ..count.. only for that panel. The last [..PANEL..] is to subset the resulting vector.
The other issue you had was that b is class numeric, so you need to convert that to factor.

displaying different symbols for each point within the same factor in R ggplot2

I am trying to create a plot to show the mean of calculated values within each group (organised by factors), as well as the induvidual points themselves. I have managed to do this successfully, however all the points use the same symbol. I want to have a different symbol for each of the points within each factor, and preferably use the same points in the same order for each factor.
An example version of the kind of graph I am currently making is below, however all the points within the same column use the same symbol.
I have thought about using the row number of the points to define the symbol shape, but I think there are only 25 different shapes available in the default ggplot2 package, and my real data has more than 25 points, plus I would prefer if the same points were used in each column, to keep the graph looking consistent.
Mean_list <- data.frame(Cells = factor(c("Celltype1", "Celltype2", "Celltype3",
"Celltype4"),
levels =c("Celltype1", "Celltype2", "Celltype3", "Celltype4")),
Mean = c(mean(c(1, 2, 3)), mean(c(5, 8, 4)), mean(c(9, 8 ,3)),
mean(c(3, 6, 8, 5))))
values_list <- data.frame(Cells2 = rep(c("Celltype1", "Celltype2", "Celltype3",
"Celltype4"), times = c(length(c(1, 2, 3)),
length(c(5, 8, 4)), length(c(9, 8 ,3)),
length(c(3, 6, 8, 5)))),
values = c(1, 2, 3, 5, 8, 4, 9, 8, 3, 3, 6, 8, 5))
ggplot() + geom_col(data = Mean_list, aes(Cells, Mean, fill = Cells)) +
geom_point(data = values_list, aes(Cells2, values))
Before plotting we may assign a number to each row in within a cell:
values_list <- values_list %>% group_by(Cells2) %>% mutate(shape = factor(seq_along(values)))
ggplot() +
geom_col(data = Mean_list, aes(Cells, Mean, fill = Cells)) +
geom_point(data = values_list, aes(Cells2, values, shape = shape))

Using ggplot in R to create a line graph for two different groups

I'm trying to create a line graph depicting different trajectories over time for two groups/conditions. I have two groups for which the data 'eat' was collected at five time points (1,2,3,4,5).
I'd like the lines to connect the mean point for each group at each of five time points, so I'd have two points at Time 1, two points at Time 2, and so on.
Here's a reproducible example:
#Example data
library(tidyverse)
library(ggplot2)
eat <- sample(1:7, size = 30, replace = TRUE)
df <- data.frame(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$time <- as.factor(df$time)
df$Condition <- as.factor(df$Condition)
#Create the plot.
library(ggplot2)
ggplot(df, aes(x = time, y = eat, fill = Condition)) + geom_line() +
geom_point(size = 4, shape = 21) +
stat_summary(fun.y = mean, colour = "red", geom = "line")
The problem is, I need my lines to go horizontally (ie to show two different colored lines moving across the x-axis). But this code just connects the dots vertically:
If I don't convert Time to a factor, but only convert Condition to a factor, I get a mess of lines. The same thing happens in my actual data, as well.
I'd like it to look like this aesthetically, with the transparent error envelopes wrapping each line. However, I don't want it to be curvy, I want the lines to be straight, connecting the means at each point.
Here's the lines running in straight segments through the means of each time, with the range set to be the standard deviation of the points at the time. One stat.summary makes the mean line with the colour aesthetic, the other makes the area using the inherited fill aesthetic. ggplot2::mean_se is a convenient function that takes a vector and returns a data frame with the mean and +/- some number of standard errors. This is the right format for thefun.data argument to stat_summary, which passes these values to the geom specified. Here, geom_ribbon accepts ymin and ymax values to plot a ribbon across the graph.
library(tidyverse)
set.seed(12345)
eat <- sample(1:7, size = 30, replace = T)
df <- data.frame(
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$Condition <- as.factor(df$Condition)
ggplot(df, aes(x = time, y = eat, fill = Condition)) +
geom_point(size = 4, shape = 21, colour = "black") +
stat_summary(geom = "ribbon", fun.data = mean_se, alpha = 0.2) +
stat_summary(
mapping = aes(colour = Condition),
geom = "line",
fun.y = mean,
show.legend = FALSE
)
Created on 2018-07-09 by the reprex package (v0.2.0).
Here's my best guess at what you want:
# keep time as numeric
df$time = as.numeric(as.character(df$time))
ggplot(df, aes(x = time, y = eat, group = Condition)) +
geom_smooth(
aes(fill = Condition, linetype = Condition),
method = "lm",
level = 0.65,
color = "black",
size = 0.3
) +
geom_point(aes(color = Condition))
Setting the level = 0.65 is about +/- 1 standard deviation on the linear model fit.
I think this code will get you most of the way there
library(tidyverse)
eat <- sample(1:7, size = 30, replace = TRUE)
tibble(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = factor(rep(c(0, 1), each = 15)),
time = factor(rep(c(1, 2, 3, 4, 5), 6)),
eat = eat) %>%
ggplot(aes(x = time, y = eat, fill = Condition, group = Condition)) +
geom_point(size = 4, shape = 21) +
geom_smooth()
geom_smooth is what you were looking for, I think. This creates a linear model out of the points, and as long as your x value is a factor, it should use the mean and connect the points that way.

How to limit the number of categories in a pie chart

The code below generates a pie chart by AlertTypeId. However, there are too many AlertTypeId and I'd like to limit the number of slices in the pie to the X most frequent alert and the rest goes into an "Other" category. How can I do that with ggplot2?
a = c(0, 0, 0, 1, 2, 3, 3, 3)
b = c(1, 1, 0, 0, 1, 1, 1, 1)
c = c(1, 4, 2, 2, 2, 1, 1, 3)
sa2 = data.frame(WeekOfYear = a, UrgentState = b, AlertTypeId = c, IsUrgent = b)
ggplot(sa2, aes(x = factor(1), fill = factor(AlertTypeId))) +
geom_bar(width = 1) +
coord_polar(theta = "y")
There are many ways to go about it, but the basic idea is that you need to
identify which AlertId's you want to select. This involves counting the number of rows per id.
send to ggplot a data.frame (or data.table) containing only those rows that you want to plot.
Here is an example using data.table:
Edit: I broke this up into multiple lines to make it easier to follow
library(data.table)
sa2.DT <- data.table(sa2, key="AlertTypeId")
# we can count the rows per id, by taking the length of any other column
ATid.Counts <- sa2.DT[, list(AT.count=length(UrgentState)), by=AlertTypeId]
# then order Id's by their counts. We will then take the `head( )`
# of this vector to identify the group being kept
ATid.Ordered <- ATid.Counts[order(AT.count, decreasing=TRUE), AlertTypeId]
ATid.Ordered is the list of Ids ordered by their frequency count.
Taking head(ATid.Ordered, n) will give the top n many of those.
Since we had set the key to sa2.DT as these Ids, we can therefore use
the ordered list (or a portion of it) to subset the data.table
# select only those rows which have an AlertTypeId in the top n many
dat <- sa2.DT[.(head(ATid.Ordered, n=3)) ] # <~~ note the dot in `.( )`
dat is the data.table (or data.frame) that we will use in ggplot
# use that selection to plot
ggplot(dat, aes(x = factor(1), fill = factor(AlertTypeId))) +
geom_bar(width = 1) +
coord_polar(theta = "y")

Resources