find time windows by conditional values in a time series - r

I have a very long time series. The timestamps are equidistant. For simplicity, I am leaving out many columns and rows:
library(data.table)
dt <- data.table(time = c(seq(0, 2, by = 0.1)),
value1 = c(1, 2, 1, 3, 2, 2, 1, 4, 6, 3, 2, 1, 2, 3, 2, 1, 3, 2, 3, 2, 1),
value2 = c(8, 7, 7, 6, 7, 8, 5, 6, 4, 1, 3, 2, 5, 7, 6, 8, 7, 8, 4, 1, 2))
The graph of the data look likes this:
library(ggplot2)
ggplot(data = dt, aes(x = time)) +
geom_line(aes(y = value1), color = "red", size = 2) +
geom_line(aes(y = value2), color = "blue", size = 2)
Now I want to find to find the time series windows, that fulfill specific conditions. All conditions have to be met at the same time window. In this example:
the red line (value1) must be between 1 and 3
the blue line (value2) must be between 6 and 8
the time window must have a length of AT LEAST 0.5 seconds. (for example: if there is a timeline of 2 seconds, where the value conditions are met, the timeline of 2 seconds should be returned an NOT 4 x 0.5 seconds timelines)
How am I able to implement this in R for a very long time series and multiple columns/conditions?
The goal is to find similar patterns in my data by a special set of conditions.

Consider this solution. It defines a window_id for every valid timestamp in a window. if a timestamp is not part of a valid window, window_id is NA.
# value constraints
dt$value_cons_met <- inrange(dt$value1,1,3) & inrange(dt$value2,6,8)
# assign all potential sequential true timestamps a group id
dt$potential_win_id <- c(0,cumsum(abs(diff(dt$value_cons_met))))
# is the window big enough?
dt[,window_size_okay := max(time)-min(time) >= 0.5 ,by = potential_win_id]
# Other window dependent constraints can be put here
# Window "ID" is defined if the valid cons are met and window size is okay
# in that case copy potential window number as window id
dt[,window_id := ifelse(value_cons_met & window_size_okay, potential_win_id,NA)]
# sample plot
library(ggplot2)
ggplot(data = dt, aes(x = time)) +
geom_line(aes(y = value1), color = "red", size = 2) +
geom_line(aes(y = value2), color = "blue", size = 2) +
geom_line(aes(y = window_id), color = "green", size =2)
This is an example plot where the detected window_id is the y value

Related

Order variables geom_point based on similar pattern across x-axis in R

How could I order the variables so they are plotted such as a heat map/where they show similar pattern, ie: at the top A and D, then B, C, and bottom E. Would want to avoid doing it manually as real data is many more variables.
Variable1 <- c(rep("A",7), rep("B",7),rep("C",7), rep("D",7), rep("E",7))
Variable2 <- c(rep(1:7, 5))
value <- c(15, 16, 11, 12, 13, 11, 12, 4, 3, 6, 5, 4, 3, 2, 3, 3, 2, 3, 3, 4, 3, 18, 17, 15, 2, 3, 4, 5, 2, 3, 4, 5, 6, 10, 18)
dff <- data.frame(Variable1, Variable2, value)
library(dplyr)
dff <- dff %>%group_by(Variable1)%>%
mutate(scaled_val = scale(value)) %>%
ungroup()
dff$Variable <- factor(dff$Variable1,levels=rev(unique(dff$Variable1)))
ggplot(dff, aes(x = Variable2, y = Variable1, label=NA)) +
geom_point(aes(size = scaled_val, colour = value)) +
geom_point(aes(size = scaled_val, colour = value), shape=21, colour="black") +
geom_text(hjust = 1, size = 2) +
theme_bw()+
scale_color_gradient(low = "lightblue", high = "darkblue")+
scale_x_discrete(expand=c(1,0))+
coord_fixed(ratio=4)
And desired:
If you look at a heat map with clustered rows by similarity for example: https://3.bp.blogspot.com/-AI2dxe95VHk/TgTJtEkoBgI/AAAAAAAAC5w/XCyBw3qViGA/s400/heatmap_cluster2.png you see at the top you have the row whose pattern are first x-axis timepoints, then the ones higher at the last x-axis timepoints..
To do: So I wonder if using the scaled value, we can do so the top are the ones with higher mean in Variable2 (1:2), then higher mean Variable2 (3:5) then Variable2 (6:7). Let me know if I am not being clear here and can explain, better.
It sounds like you want to arrange groups A-E based on their mean. You can do that by converting Variable1 into a factor with custom levels:
lvls <- names(sort(by(dff$value, dff$Variable1, mean)))
dff$Variable1 <- factor(dff$Variable1, levels = lvls)
Here's a solution that sorts groups by which.max:
peaks <- c(by(dff$value, dff$Variable1, which.max))
lvls <- names(sort(peaks))
dff$Variable1 <- factor(dff$Variable1, levels = lvls)

How to draw a stacked barplot with three categorical variables representing the proportion of only one of them for each facet in r?

This is the data to take up as a reference
df <- data.frame(a = c(3,3,3,3,3,2,2,3,2,1,1,1,3,1,3), b = c(1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2), c = c(4, 5, 3, 2, 4, 2, 3, 4, 5, 4, 4, 3, 3, 1, 2) )
I want to draw a bargraph with the proportion of a for each facet. At the same time I want the bars to be colored according to the b value.
The variable b is not relevant for calculating the percentage. This is what I came up with, when I set the fill = c, it divides the stacked color in two, one corresponding to 1, and the other as NA.
ggplot(aes(x = a, y = ...prop..., group = 1, fill = b)) +
geom_bar(position = "stack") +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c")
how can I have a result similar to this one but with the proportions of a for each facet wrap instead of the absolute values?
Thank you!
Here's an approach using the ..count.. and ..PANEL.. special symbols:
ggplot(df, aes(x = a, fill = as.factor(b))) +
geom_bar(aes(y = ..count.. / tapply(..count..,..PANEL..,sum)[..PANEL..])) +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c", fill = "b", y = "Proportion")
If you weren't using facet_wrap this would be trivial by setting y = ..prop... However, ..prop.. is not caculated properly by facet. So, to get around this problem, we can use tapply and the ..PANEL.. special symbol to sum ..count.. only for that panel. The last [..PANEL..] is to subset the resulting vector.
The other issue you had was that b is class numeric, so you need to convert that to factor.

Multiple horizontal barplots in one chart

I want to have two charts containing multiple horizontal bar graphs, each showing mean values of one of the two variables: fear and expectation. The bar graphs should be grouped by the dummies.
I have created single bar graphs with the mean values of fear and expectation grouped by each of the dummies but I don't know how to combine them properly.
x = data.frame(
id = c(1, 2, 3, 4, 5),
sex = c(1, 0, 1, 0, 1),
migration = c(0, 1, 0, 1, 0),
handicap = c(0, 1, 1, 1, 0),
east = c(0, 1, 1, 1, 0),
fear = c(1, 3, 4, 6, 3),
expectation = c(2, 3, 2, 5, 4))
I want to have it look like this basically:
https://ibb.co/3fz0GQ4
Any help would be greatly appreciated.
TO get to the plot you show, you will need to reshape a bit your data:
library(tidyverse)
x2 <- x%>%
gather(fear, expectation, key = "group", value = "value")%>%
gather(sex, migration, handicap, east, key = "dummies", value = "dum_value")%>%
group_by(group, dummies, dum_value)%>%
summarize(prop = mean(value))
Then you can easily get to the plot:
x2%>%
ggplot(aes(y= prop, x = dummies, fill = factor(dum_value)))+
geom_bar(stat = "identity", position = "dodge")+
coord_flip()+
facet_wrap(~group)

displaying different symbols for each point within the same factor in R ggplot2

I am trying to create a plot to show the mean of calculated values within each group (organised by factors), as well as the induvidual points themselves. I have managed to do this successfully, however all the points use the same symbol. I want to have a different symbol for each of the points within each factor, and preferably use the same points in the same order for each factor.
An example version of the kind of graph I am currently making is below, however all the points within the same column use the same symbol.
I have thought about using the row number of the points to define the symbol shape, but I think there are only 25 different shapes available in the default ggplot2 package, and my real data has more than 25 points, plus I would prefer if the same points were used in each column, to keep the graph looking consistent.
Mean_list <- data.frame(Cells = factor(c("Celltype1", "Celltype2", "Celltype3",
"Celltype4"),
levels =c("Celltype1", "Celltype2", "Celltype3", "Celltype4")),
Mean = c(mean(c(1, 2, 3)), mean(c(5, 8, 4)), mean(c(9, 8 ,3)),
mean(c(3, 6, 8, 5))))
values_list <- data.frame(Cells2 = rep(c("Celltype1", "Celltype2", "Celltype3",
"Celltype4"), times = c(length(c(1, 2, 3)),
length(c(5, 8, 4)), length(c(9, 8 ,3)),
length(c(3, 6, 8, 5)))),
values = c(1, 2, 3, 5, 8, 4, 9, 8, 3, 3, 6, 8, 5))
ggplot() + geom_col(data = Mean_list, aes(Cells, Mean, fill = Cells)) +
geom_point(data = values_list, aes(Cells2, values))
Before plotting we may assign a number to each row in within a cell:
values_list <- values_list %>% group_by(Cells2) %>% mutate(shape = factor(seq_along(values)))
ggplot() +
geom_col(data = Mean_list, aes(Cells, Mean, fill = Cells)) +
geom_point(data = values_list, aes(Cells2, values, shape = shape))

Plot overlapping vertical lines with ggplot

I have a list of time-ordered pairwise interactions. I want to plot a temporal network of these interactions, which would look something like the diagram below.
My data looks like the example below. The id1 and id2 values are the unique identifiers of individuals. The time indicates when an interaction betweens those individuals occurred. So at time = 1, I want to plot a connection between individual-1 and individual-2.
id1 <- c(1, 2, 1, 6, 2, 2, 1)
id2 <- c(2, 4, 5, 7, 3, 4, 5)
time <- c(1, 2, 2, 2, 3, 4, 5)
df <- data.frame(id1, id2, time)
According to this StackOverflow question, I can see that it is possible to draw vertical lines between positions on the y-axis in ggplot. This is achieved by reshaping the data into a long format. This is fine when there is only one pair per time value, but not when there is more than one interacting pair at a time. For example in my dummy data, at time = 2, there are three pairs (in the plot I would show these by overlaying lines with reduced opacity).
My question is, how can I organise these data in a way that ggplot will be able to plot potentially multiple interacting pairs at specified time points?
I have been trying to reorganise the data by assigning an extra identifier to each of the multiple pairs that occur at the same time. I imagined the data table to look like this, but I haven't figure out how to make this in R... In this example the three interactions at time = 2 are identified by an extra grouping of either 1, 2 or 3. Even if I could arrange this I'm still not sure how I would get ggplot to read it.
Ultimately I'm trying to create someting that looks like Fig. 2 in this scientific paper.
Any help would be appreciated!
You can do this without reshaping the data, just set one id to y and the other id to yend in geom_curve:
ggplot(df, aes(x = time, y = id1)) +
geom_curve(aes(xend = time, yend = id2), curvature = 0.3) +
geom_hline(yintercept = 1:7, colour = scales::muted("blue")) +
geom_point(size = 3) +
geom_point(aes(y = id2), size = 3) +
coord_cartesian(xlim = c(0, max(df$time) + 1)) +
theme_bw()
Output:
Libraries:
library('ggplot2')
library('data.table')
Data:
id1 <- c(1, 2, 1, 6, 2, 2, 1)
id2 <- c(2, 4, 5, 7, 3, 4, 5)
time <- c(1, 2, 2, 2, 3, 4, 5)
df <- data.frame(id1, id2, time)
setDT(df)
df1 <- melt.data.table( df, id.vars = c('time'))
Plot:
p <- ggplot( df1, aes(time, value)) +
geom_point() +
geom_curve( mapping = aes(x = time, y = id1, xend = time, yend = id2, colour = "curve"),
data = df,
curvature = 0.2 )
print(p)

Resources