Plotting multiple columns, grouping by date, and adjusting scale - r

So here I what I want, I want to plot 4 columns (Standing, Sitting, Stepping, Cycling) vs Time, and have 1 plot per date. I also want the Y scale to be scaled between 0.5 and 4.5, BUT the Y axis be invisible and a legend saying which color is which.
Here is a sample of my data:
> head(graph_pre,30)
Date Time Axis1 Axis2 Axis3 VM Standing Stepping Cycling New_Sitting Counter
1 2022-05-10 2022-05-10 09:01:00 21 40 2 45.22 0 0 2 0 0
2 2022-05-10 2022-05-10 09:01:01 0 36 1 36.01 0 0 0 1 1
3 2022-05-10 2022-05-10 09:01:02 24 1 0 24.02 0 0 0 1 0
4 2022-05-10 2022-05-10 09:01:03 48 31 4 57.28 0 0 2 0 1
5 2022-05-10 2022-05-10 09:01:04 0 6 0 6.00 0 0 0 1 1
6 2022-05-10 2022-05-10 09:01:05 0 0 0 0.00 0 0 0 1 0
7 2022-05-10 2022-05-10 09:01:06 0 0 0 0.00 0 0 0 1 0
8 2022-05-10 2022-05-10 09:01:07 0 0 0 0.00 0 0 0 1 0
9 2022-05-10 2022-05-10 09:01:08 0 5 2 5.39 0 0 0 1 0
10 2022-05-10 2022-05-10 09:01:09 20 33 3 38.70 0 0 0 1 0
11 2022-05-10 2022-05-10 09:01:10 14 26 29 41.39 0 0 2 0 1
12 2022-05-10 2022-05-10 09:01:11 11 0 4 11.70 0 0 0 1 1
13 2022-05-10 2022-05-10 09:01:12 0 0 0 0.00 0 0 0 1 0
14 2022-05-10 2022-05-10 09:01:13 0 0 0 0.00 0 0 0 1 0
15 2022-05-10 2022-05-10 09:01:14 82 126 113 188.07 0 3 0 0 1
16 2022-05-10 2022-05-10 09:01:15 60 64 47 99.52 0 0 2 0 1
17 2022-05-10 2022-05-10 09:01:16 98 140 236 291.38 0 0 2 0 0
18 2022-05-10 2022-05-10 09:01:17 151 118 221 292.52 0 0 2 0 0
19 2022-05-10 2022-05-10 09:01:18 44 13 99 109.11 0 0 2 0 0
20 2022-05-10 2022-05-10 09:01:19 6 6 53 53.67 0 0 2 0 0
21 2022-05-10 2022-05-10 09:01:20 39 8 65 76.22 0 0 2 0 0
22 2022-05-10 2022-05-10 09:01:21 17 20 57 62.75 0 0 2 0 0
23 2022-05-10 2022-05-10 09:01:22 51 46 269 277.63 0 0 2 0 0
24 2022-05-10 2022-05-10 09:01:23 15 45 82 94.73 0 3 0 0 1
25 2022-05-10 2022-05-10 09:01:24 22 34 4 40.69 0 0 2 0 1
26 2022-05-10 2022-05-10 09:01:25 114 93 41 152.73 0 0 2 0 0
27 2022-05-10 2022-05-10 09:01:26 74 67 92 135.75 0 0 2 0 0
28 2022-05-10 2022-05-10 09:01:27 117 9 40 123.98 0 0 2 0 0
29 2022-05-10 2022-05-10 09:01:28 33 15 0 36.25 0 0 0 1 1
30 2022-05-10 2022-05-10 09:01:29 0 0 0 0.00 0 0 0 1 0
I have the code to separate by date, and to "kinda" plot, but I need it for the 4 columns.
graph_pre <- mutate(graph_pre, day = lubridate::day(Date))
ggplot(graph_pre, aes(x = Time, y = Posture))+
geom_point()+
facet_wrap(~day, scales = "free_x")
dput(head(graph_pre,30))
structure(list(Date = structure(c(19122, 19122, 19122, 19122,
19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122,
19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122,
19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122), class = "Date"),
Time = structure(c(1652187660, 1652187661, 1652187662, 1652187663,
1652187664, 1652187665, 1652187666, 1652187667, 1652187668,
1652187669, 1652187670, 1652187671, 1652187672, 1652187673,
1652187674, 1652187675, 1652187676, 1652187677, 1652187678,
1652187679, 1652187680, 1652187681, 1652187682, 1652187683,
1652187684, 1652187685, 1652187686, 1652187687, 1652187688,
1652187689), class = c("POSIXct", "POSIXt"), tzone = ""),
Axis1 = c(21, 0, 24, 48, 0, 0, 0, 0, 0, 20, 14, 11, 0, 0,
82, 60, 98, 151, 44, 6, 39, 17, 51, 15, 22, 114, 74, 117,
33, 0), Axis2 = c(40, 36, 1, 31, 6, 0, 0, 0, 5, 33, 26, 0,
0, 0, 126, 64, 140, 118, 13, 6, 8, 20, 46, 45, 34, 93, 67,
9, 15, 0), Axis3 = c(2, 1, 0, 4, 0, 0, 0, 0, 2, 3, 29, 4,
0, 0, 113, 47, 236, 221, 99, 53, 65, 57, 269, 82, 4, 41,
92, 40, 0, 0), VM = c(45.22, 36.01, 24.02, 57.28, 6, 0, 0,
0, 5.39, 38.7, 41.39, 11.7, 0, 0, 188.07, 99.52, 291.38,
292.52, 109.11, 53.67, 76.22, 62.75, 277.63, 94.73, 40.69,
152.73, 135.75, 123.98, 36.25, 0), Standing = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), Stepping = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0,
0, 0, 0, 0), Cycling = c(2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2,
0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 0),
New_Sitting = c(0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1), Counter = c(0L,
1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L)), row.names = c(NA,
30L), class = "data.frame")

First thing, we should pivot_longer to pull the four posture columns into name-value pairs. Here I've put the names into the "Posture" column. Then we can map that to color and use the values for the y axis.
I've specified the range in scale_y_continuous, but it could also be done with coord_cartesian(ylim = c(0.5,4.5)) -- the difference will be that the out of range points are filtered out in this way, but are in some sense "still there" if you use the coord_cartesian option. That can make a difference if you are doing a summary step, like geom_boxplot or geom_smooth.
Finally, I use theme to specify the y-axis related elements that should be hidden.
library(tidyverse)
graph %>%
mutate(day = lubridate::day(Date)) %>%
pivot_longer(Standing:New_Sitting, names_to = "Posture") %>%
ggplot(aes(x = Time, y = value, color = Posture))+
geom_point()+
scale_y_continuous(limits = c(0.5,4.5), expand = expansion(0)) +
facet_wrap(~day, scales = "free_x") +
labs(title = "Posture vs. Time") +
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank())

Here you go:
library(tidyverse)
graph_pre_long <- graph_pre %>% pivot_longer(c(Standing, New_Sitting , Stepping, Cycling), names_to = "Posture")
ggplot(graph_pre_long, aes(x = Time, y = value, color = Posture))+
geom_point()+
facet_wrap(~day, scales = "free_x") +
ylim(.5, 4.5) +
theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())

Related

How to merge multiple columns

I want to merge 4 columns together, (Standing, Stepping, Cycling, New_Sitting). In this case, I want to create a new column (called "Posture"). This new column (as per the example below) should be like:
Posture
<dbl>
2
3
2
1
1
1
3
4
4
4
Here is an example of my data:
> head(graph_pre,30)
# A tibble: 30 × 11
# Groups: Date [1]
Date Time Axis1 Axis2 Axis3 VM Standing Stepping Cycling New_Sitting
<date> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2022-03-14 2022-03-14 09:51:00 89 41 39 105. 0 0 2 0
2 2022-03-14 2022-03-14 09:51:01 88 135 117 199. 0 3 0 0
3 2022-03-14 2022-03-14 09:51:02 0 61 8 61.5 0 0 2 0
4 2022-03-14 2022-03-14 09:51:03 0 25 0 25 0 0 0 1
5 2022-03-14 2022-03-14 09:51:04 0 0 0 0 0 0 0 1
6 2022-03-14 2022-03-14 09:51:05 0 0 0 0 0 0 0 1
7 2022-03-14 2022-03-14 09:51:06 0 24 35 42.4 0 3 0 0
8 2022-03-14 2022-03-14 09:51:07 0 28 0 28 4 0 0 0
9 2022-03-14 2022-03-14 09:51:08 4 96 20 98.1 4 0 0 0
10 2022-03-14 2022-03-14 09:51:09 0 11 0 11 4 0 0 0
# … with 20 more rows, and 1 more variable: Counter <int>
Please let me know if you need more information as I'm new to this.
EDIT
> dput(head(graph_pre,30))
structure(list(Date = structure(c(19065, 19065, 19065, 19065,
19065, 19065, 19065, 19065, 19065, 19065, 19065, 19065, 19065,
19065, 19065, 19065, 19065, 19065, 19065, 19065, 19065, 19065,
19065, 19065, 19065, 19065, 19065, 19065, 19065, 19065), class = "Date"),
Time = structure(c(1647265860, 1647265861, 1647265862, 1647265863,
1647265864, 1647265865, 1647265866, 1647265867, 1647265868,
1647265869, 1647265870, 1647265871, 1647265872, 1647265873,
1647265874, 1647265875, 1647265876, 1647265877, 1647265878,
1647265879, 1647265880, 1647265881, 1647265882, 1647265883,
1647265884, 1647265885, 1647265886, 1647265887, 1647265888,
1647265889), tzone = "", class = c("POSIXct", "POSIXt")),
Axis1 = c(89, 88, 0, 0, 0, 0, 0, 0, 4, 0, 3, 9, 5, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 11, 3, 0), Axis2 = c(41,
135, 61, 25, 0, 0, 24, 28, 96, 11, 91, 44, 8, 8, 29, 1, 17,
0, 0, 0, 15, 0, 0, 0, 0, 28, 47, 28, 48, 0), Axis3 = c(39,
117, 8, 0, 0, 0, 35, 0, 20, 0, 22, 2, 16, 21, 48, 3, 35,
0, 5, 29, 32, 0, 0, 0, 0, 4, 26, 68, 5, 0), VM = c(105.47,
199.14, 61.52, 25, 0, 0, 42.44, 28, 98.14, 11, 93.67, 44.96,
18.57, 22.47, 56.09, 3.16, 38.91, 0, 5, 29, 35.34, 0, 0,
0, 0, 28.28, 55.26, 74.36, 48.35, 0), Standing = c(0, 0,
0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 4, 4, 0, 4, 0, 4, 4, 0, 0,
4, 4, 4, 4, 4, 0, 0, 4, 4), Stepping = c(0, 3, 0, 0, 0, 0,
3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0, 0, 3, 3, 0, 0, 0, 0,
0, 3, 3, 0, 0), Cycling = c(2, 0, 2, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0), New_Sitting = c(0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Counter = c(0L,
1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -30L), groups = structure(list(
Date = structure(19065, class = "Date"), .rows = structure(list(
1:30), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))
What you can do is first replace the zeros with NA and after that unite the columns together. You can use the following code:
library(dplyr)
library(tidyr)
graph_pre %>%
mutate(across(Standing:New_Sitting, na_if, 0)) %>%
unite(Posture, Standing:New_Sitting, na.rm = TRUE, sep = '', remove = T) %>%
mutate(Posture = as.numeric(Posture))
Output:
# A tibble: 30 × 8
# Groups: Date [1]
Date Time Axis1 Axis2 Axis3 VM Posture Counter
<date> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 2022-03-14 2022-03-14 14:51:00 89 41 39 105. 2 0
2 2022-03-14 2022-03-14 14:51:01 88 135 117 199. 3 1
3 2022-03-14 2022-03-14 14:51:02 0 61 8 61.5 2 1
4 2022-03-14 2022-03-14 14:51:03 0 25 0 25 1 1
5 2022-03-14 2022-03-14 14:51:04 0 0 0 0 1 0
6 2022-03-14 2022-03-14 14:51:05 0 0 0 0 1 0
7 2022-03-14 2022-03-14 14:51:06 0 24 35 42.4 3 1
8 2022-03-14 2022-03-14 14:51:07 0 28 0 28 4 1
9 2022-03-14 2022-03-14 14:51:08 4 96 20 98.1 4 0
10 2022-03-14 2022-03-14 14:51:09 0 11 0 11 4 0
# … with 20 more rows
If you just want to merge them by summing the values for each row, you can do this:
library(tidyverse)
your_dataframe %>%
mutate(Posture = sum(Standing, Stepping, Cycling, New_Sitting))
Which will add an extra column called Posture at the end of your dataframe

grouped and stacked bar plots using plotly

I am new to plotly and not very good with R. I am trying to do stack plots and ended up with a very cumbersome code, that I am sure could be simplify using RColorbrewer and perhaps ggplot2 to group my stacked bar plots, but I am unsure on how to do it.
Below is the data I used, which is in a data.frame called data2
Nation glider radar AUV ROV USV corer towed_eq Seismic_eq Drill_rig Manned_sub Other clean
1 Belgium 0 0 1 1 1 3 0 0 0 0 0 6
2 Bulgaria 0 0 0 0 0 0 1 0 0 1 0 2
3 Croatia 0 2 1 2 0 0 0 0 0 0 0 5
4 Cyprus 3 0 0 0 0 0 0 0 0 0 0 3
5 Estonia 0 0 0 1 0 0 0 0 0 0 0 1
6 Finland 1 0 0 0 0 0 0 0 0 0 0 1
7 France 11 2 3 1 0 1 1 3 0 1 0 23
8 Germany 18 3 3 4 0 0 1 4 2 1 0 36
9 Greece 1 0 0 3 0 0 0 0 0 0 0 4
10 Ireland 0 0 0 2 0 0 0 0 0 0 0 2
11 Italy 10 8 3 2 4 0 0 1 0 0 0 28
12 Malta 0 2 0 0 0 0 0 0 0 0 0 2
13 Netherlands 0 2 0 0 0 0 0 0 0 0 0 2
14 Norway 17 3 1 3 0 1 3 1 0 0 1 30
15 Poland 0 0 0 1 0 0 0 0 0 0 0 1
16 Portugal 0 3 6 6 4 2 1 0 0 2 1 25
17 Romania 0 0 0 1 0 0 0 0 0 0 0 1
18 Slovenia 0 1 0 0 0 0 0 0 0 0 0 1
19 Spain 12 17 2 1 0 0 0 2 0 0 0 34
20 Sweden 0 2 1 3 0 0 0 0 0 0 0 6
21 Turkey 0 0 0 0 0 0 0 0 0 2 0 2
22 United Kingdom 0 0 13 4 1 11 4 2 1 0 4 40
23 Unknown 5 0 0 0 0 0 0 0 0 0 0 5
And this is the code I used
fig <- plot_ly(data2, x = ~Nation, y = ~glider, type = 'bar', name = 'Glider')
fig <- fig %>% add_trace(y = ~radar, name = 'Radar', marker=list(color='rgb(26, 118, 255)'))
fig <- fig %>% add_trace(y = ~AUV, name = 'AUV',marker=list(color='rgb(255, 128, 0)'))
fig <- fig %>% add_trace(y = ~ROV, name = 'ROV',marker=list(color='rgb(204, 0, 0)'))
fig <- fig %>% add_trace(y = ~USV, name = 'USV',marker=list(color='rgb(51, 255, 153)'))
fig <- fig %>% add_trace(y = ~corer, name = 'Corer',marker=list(color='rgb(204, 0, 204)'))
fig <- fig %>% add_trace(y = ~towed_eq, name = 'Towed equipment',marker=list(color='rgb(255, 255, 51)'))
fig <- fig %>% add_trace(y = ~Seismic_eq, name = 'Seismic equipment',marker=list(color='rgb(255, 204, 229)'))
fig <- fig %>% add_trace(y = ~Drill_rig, name = 'Drill rig',marker=list(color='rgb(102, 255, 255)'))
fig <- fig %>% add_trace(y = ~Manned_sub, name = 'Manned submersible',marker=list(color='rgb(128, 255, 0)'))
fig <- fig %>% add_trace(y = ~Other, name = 'Other equipment',marker=list(color='rgb(153, 153, 0)'))
fig <- fig %>% layout(xaxis = list(title = "",tickfont = list(size = 14)), yaxis = list(title = 'Number of assets',tickfont = list(size = 14)), barmode = 'stack')
fig
Is there an easier way to code this by using Rcolorbrewer instead of coding each color? and is it possible to group my stacked barplots Group1 (glider, auv, rov, usv), Group 2 (corer,towed_ew, seismic_eq, drill_rig) and Group 3 (radar, manned_sub, Other)?stack_plot
You can try this approach by melting the data:
library(dplyr)
library(plotly)
library(tidyr)
library(RColorBrewer)
#Data
data <- structure(list(Nation = c("Belgium", "Bulgaria", "Croatia", "Cyprus",
"Estonia", "Finland", "France", "Germany", "Greece", "Ireland",
"Italy", "Malta", "Netherlands", "Norway", "Poland", "Portugal",
"Romania", "Slovenia", "Spain", "Sweden", "Turkey", "United Kingdom",
"Unknown"), glider = c(0, 0, 0, 3, 0, 1, 11, 18, 1, 0, 10, 0,
0, 17, 0, 0, 0, 0, 12, 0, 0, 0, 5), radar = c(0, 0, 2, 0, 0,
0, 2, 3, 0, 0, 8, 2, 2, 3, 0, 3, 0, 1, 17, 2, 0, 0, 0), AUV = c(1,
0, 1, 0, 0, 0, 3, 3, 0, 0, 3, 0, 0, 1, 0, 6, 0, 0, 2, 1, 0, 13,
0), ROV = c(1, 0, 2, 0, 1, 0, 1, 4, 3, 2, 2, 0, 0, 3, 1, 6, 1,
0, 1, 3, 0, 4, 0), USV = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0,
0, 0, 0, 4, 0, 0, 0, 0, 0, 1, 0), corer = c(3, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 11, 0), towed_eq = c(0,
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0, 4,
0), Seismic_eq = c(0, 0, 0, 0, 0, 0, 3, 4, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 2, 0, 0, 2, 0), Drill_rig = c(0, 0, 0, 0, 0, 0, 0,
2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0), Manned_sub = c(0,
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0,
0), Other = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 4, 0), clean = c(6, 2, 5, 3, 1, 1, 23, 36, 4,
2, 28, 2, 2, 30, 1, 25, 1, 1, 34, 6, 2, 40, 5)), row.names = c(NA,
-23L), class = "data.frame")
Now the code:
#First reshape
df2 <- pivot_longer(data,cols = -Nation)
#Plot
p <- plot_ly(df2, x = df2$Nation,
y = df2$value,
type = 'bar',
name = df2$name,
text = df2$value,
color = df2$name,
colors = brewer.pal(length(unique(df2$name)),
"Paired"))%>%
layout(barmode = 'stack',hoverlabel = list(bgcolor= 'white') ,bargap = 0.5) %>%
layout(xaxis = list(categoryorder = 'array',
categoryarray = df2$Nation), showlegend = T)
The output:

Include all variables in tsibble formula

I want to fit a linear regression model using the tsibble package and I have a bunch of dummy variables that I want to include in my analysis. A sample dataset would be the following:
library(tsibble)
library(dplyr)
library(fable)
ex = structure(list(id = c("KEY1", "KEY1", "KEY1", "KEY1", "KEY1",
"KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1",
"KEY1", "KEY1"), sales = c(0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), date = structure(c(15003, 15004, 15005, 15006, 15007,
15008, 15009, 15010, 15011, 15012, 15013, 15014, 15015, 15016,
15017), class = "Date"), wday = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L), dummy_1 = c(0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), dummy_2 = c(0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0), dummy_3 = c(0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -15L), key = structure(list(
id = "KEY1", .rows = list(1:15)), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), index = structure("date", ordered = TRUE), index2 = "date", interval = structure(list(
year = 0, quarter = 0, month = 0, week = 0, day = 1, hour = 0,
minute = 0, second = 0, millisecond = 0, microsecond = 0,
nanosecond = 0, unit = 0), class = "interval"), class = c("tbl_ts",
"tbl_df", "tbl", "data.frame"))
> ex
# A tsibble: 15 x 7 [1D]
# Key: id [1]
id sales date wday dummy_1 dummy_2 dummy_3
<chr> <dbl> <date> <int> <dbl> <dbl> <dbl>
1 KEY1 0 2011-01-29 1 0 0 0
2 KEY1 5 2011-01-30 2 0 0 0
3 KEY1 0 2011-01-31 3 0 0 1
4 KEY1 0 2011-02-01 4 1 0 0
5 KEY1 0 2011-02-02 5 0 0 0
6 KEY1 0 2011-02-03 6 0 0 0
7 KEY1 0 2011-02-04 7 0 1 0
8 KEY1 0 2011-02-05 1 0 0 0
9 KEY1 0 2011-02-06 2 0 0 0
10 KEY1 0 2011-02-07 3 0 0 0
11 KEY1 0 2011-02-08 4 0 0 0
12 KEY1 0 2011-02-09 5 0 0 0
13 KEY1 0 2011-02-10 6 0 0 0
14 KEY1 0 2011-02-11 7 0 0 0
15 KEY1 0 2011-02-12 1 0 0 0
They are too many dummies to specify manually so I was hoping for something faster. Normally I would use the . symbol in the formula in the following way:
fit = ex %>%
model(TSLM(sales ~ trend() + season() + .))
But this does not work:
Warning message:
1 error encountered for TSLM(sales ~ trend() + season() + .)
[1] '.' in formula and no 'data' argument
Is there a systematic tsibble way around this or do I have to create the formula on the fly using the names of the dataset?
We could create a formula with reformulate using the 'dummy' column names
nm1 <- names(ex)[startsWith(names(ex), 'dummy')]
ex %>%
model(lm = TSLM(reformulate(c(nm1, 'trend()', 'season()'), 'sales') ))

Plotting Traits on a Phylogeny

I am following this guide on how to plot traits onto a phylogeny to determine trait conservatism. I have followed it step by step but can't seem to get either the community composition or trait plots on phylogeny to work at all for my datasets. I have formatted just as they said and it looks just like their example data sents to me.
I am not sure how to put tree files on here so here is one on a cloud link for all species and here is a tree that I used just for my native species used for trait plotting
VegComm <- df2vec(as.matrix(Veg2018), colID = 1:29) #community data
STraits <- read.csv()
rownames(STraits)<- STraits[,1]
STraits[1:1] <- list(NULL) #Trait Data
STraits <- df2vec(as.matrix.data.frame(STraits), colID=1:5)
STraits <- STraits[1:6,]
str(STraits)
prune.sample(VegComm,alltree)
par(mfrow=c(2,2))
for (i in colnames(STraits)) {
+ plot(nativetree, show.tip.label=TRUE, main=i)
+ tiplabels(pch=22, col=STraits[,i]+1, bg=STraits[,i]+1, cex=1.5)}
traits <- STraits[nativetree$tip.label,]
phylosignal(nativetree, STraits, nsim=1000, method="K")
Here is the community data:
Avena_fatua Bromus_diandrus Bromus_hordeaceus Festuca_myuros Festuca_perennis Carduus_pycnocephalus Cirsium_vulgare Erodium_cicutarium Geranium_dissectum Helminthotheca_echioides Lactuca_serriola Medicago_polymorpha Oxalis_pes-caprae Raphanus_sativus Senecio_vulgaris Sonchus_oleraceus Vicia_sativa Artemisia_californica Baccharis_pilularis Ericameria_ericoides Mimulus_aurantiacus Bromus_carinatus Elymus_triticoides Hordeum_brachyantherum Stipa_pulchra Achillea_millefolium Eschscholzia_californica Lupinus_variicolor Echium_candicans
PC1 0 1.25 0 20.83333333 7.416666667 0.5 0 0 21.25 0.333333333 0 6.916666667 0 4.916666667 0 0 0 4.583333333 18.33333333 1.25 0.833333333 0.5 0 0 0 7.5 1.25 0 0
PC2 0.5 0 0.333333333 14.16666667 2.25 0 0 0 25 0 1.916666667 30.41666667 0 3.666666667 0.833333333 0.833333333 0 0 17.91666667 0 0 2.083333333 0 0 0 3.333333333 0 0 0
PC3 0.333333333 4.083333333 0 27.5 3.333333333 6.083333333 0 0 15.83333333 1.75 2.416666667 3.833333333 0 6.666666667 0 5.916666667 0 1.25 2.083333333 0 2.5 5.416666667 0 0 1.25 5 0 0 0
PC4 0.333333333 1.25 3.333333333 10.41666667 15.83333333 5.833333333 0 0 25.83333333 0 1.583333333 10.75 0 5.833333333 0 1.25 0 0 2.083333333 0 0 0 0 0 0 3.416666667 2.916666667 0 0
PC5 1.916666667 0 8.833333333 10.91666667 6.666666667 0 0.333333333 0 15 1.25 1.75 0 0 3.333333333 0 10.83333333 0.5 0 3.333333333 0.5 0 4.666666667 0 0 0.5 9.166666667 0 0 0.666666667
PS1 0.333333333 3 0 6.25 2.25 16 0 0 11.41666667 0.333333333 0 3.833333333 0 0.833333333 0 1.166666667 0 0 12 0 0.166666667 3.333333333 0 0 0 49.16666667 0 0 0
PS2 2.25 4 0 6.5 1.25 13.75 0 4.166666667 10.5 0 0 6.666666667 0 4.5 0 0 0 1.583333333 3.833333333 0 4.166666667 4.166666667 0 0 1.25 22.91666667 1.25 0 0
PS3 2.5 0 0 5.083333333 1.25 0.833333333 0 5.916666667 20.83333333 0 0 16.66666667 0 7.583333333 0 1.333333333 0 0 4.5 0 0 0.333333333 0 0 1.75 25.41666667 0 0 0
PS4 2.25 0 1.5 2.5 1.75 0 2.5 0 22.91666667 0 0 19.16666667 2.916666667 18.33333333 0 0 0 2.916666667 6.666666667 0 1.25 5.5 0 0 4.583333333 8.75 0 2.5 0
PS5 4.75 0 1.75 7 2.083333333 4.666666667 0 0 18.08333333 0 0 4.25 0 13.75 0 0 0 0 0 0 0 0 0 0 0 34.33333333 0 0 0
PW1 4.75 1.75 0.666666667 11.83333333 4.916666667 0 0 0 15 2.833333333 1.25 39.16666667 0 0.666666667 0 3.833333333 0 0 4.166666667 0 0 0.833333333 0 0 0 14.16666667 0.666666667 0 1.25
PW2 2.5 0 4 21.66666667 4.666666667 0.5 0 0 25.41666667 0 1.25 7.083333333 0 14.58333333 0 0.833333333 0 1.25 1.25 0 0 3.333333333 0 0 1.25 4.166666667 1 0 0
PW3 1.583333333 1.25 0 10.66666667 4.25 5.75 0 0 12.5 0 1.5 30 0 0.333333333 0 0.333333333 0 3.833333333 0 0 0 2.083333333 0 0 4.583333333 10 0 0 0
PW4 0 1.25 6.666666667 9.916666667 8.25 0 0 0 33.33333333 0 0 5.833333333 0 5.833333333 0 2.083333333 0 0 1.25 0 0 2.5 0 0 0 3.75 1.583333333 0 0
PW5 2.25 2.083333333 0.333333333 10.41666667 4.416666667 1.25 0 0 23.33333333 0 0 4.583333333 0 5.083333333 0 0 13.33333333 12.66666667 8.333333333 0 0 0 0 0 0 12 0 0 0
Here is the trait data: (I tried omitting and not omitting NAs)
Growth_Rate Area AreaVar SLA SLAVar VLA VLAVar Thickness ThicknessVar logThickness logThicknessVar LV LVVar PD0 PD10 PD50 CPD
Achillea_millefolium 0.090888257 15.80656659 12.43783158 NA NA NA NA 0.249744167 0.187092582 -1.553441666 0.458076381 NA NA 12.61566 29.016 250 0.721921544
Artemisia_californica 0.035049437 14.56355219 11.78670881 180.1322546 99.50427931 9.364236482 1.414207935 0.268703703 0.074128238 -1.352780779 0.298806173 43.22157529 13.35296757 12.61566 29.016 250 0.721921544
Bromus_carinatus 0.022607407 2.384166667 2.316140235 NA NA NA NA NA NA NA NA NA NA 5.41269 11.7111 315.3334 0.681203858
Ericameria_ericoides 0.019809977 3.6875 1.703521078 NA NA NA NA NA NA NA NA NA NA 12.61566 29.016 250 0.721921544
Eschscholzia_californica 0.029380702 1.245833333 1.076820745 262.1630059 60.49033956 4.392284625 0.596306575 0.16357684 0.038660691 -1.835819399 0.223972815 39.80718218 11.25985865 294 294 294 0.577356321
Hosackia_gracilis 0.009183502 NA NA NA NA NA NA NA NA NA NA NA NA 41.81336 101.22 250 0.638988811
Lupinus_nanus 0.040867178 NA NA NA NA NA NA NA NA NA NA NA NA 33.60001 101.22 250 0.640373244
Lupinus_variicolor 0.028428463 NA NA NA NA NA NA NA NA NA NA NA NA 33.60001 101.22 250 0.640373244
Mimulus_aurantiacus 0.00652489 0.00652489 0.011364841 3.412857143 2.976064883 151.5001201 79.68333552 2.370279914 0.731201273 0.285257143 0.120154396 37.54090305 16.93270863 183.7778 209.3333 250 0.622318052
Sisyrinchium_bellum 0.01441308 5.477777778 5.117901992 181.6818246 42.91299583 2.954769874 0.448780843 0.176855556 0.018545802 -1.735344864 0.107673785 31.80493389 4.311588188 225.2889 225.2889 315.3334 0.594958509
Sidalcea_malviflora 0.020075948 4.974358974 4.901863202 142.4036892 39.11274955 1.651824981 0.295753475 0.148082051 0.045211395 -1.953346759 0.300665842 20.91557187 8.108682659 163.3333 193 250 0.625836637
Stipa_pulchra 0.01546666 5.28968254 6.055307558 122.3827137 32.67582669 7.352684101 3.027753522 0.149629537 0.031130015 -1.943799376 0.210327301 17.91978995 5.823172424 24 24 315.3334 0.611910294
Here are the dput outputs:
> dput(STraits)
structure(c(0.035049437, 0.029380702, 0.00652489, 0.01441308,
0.020075948, 0.01546666, 14.56355219, 1.245833333, 0.00652489,
5.477777778, 4.974358974, 5.28968254, 11.78670881, 1.076820745,
0.011364841, 5.117901992, 4.901863202, 6.055307558, 180.1322546,
262.1630059, 3.412857143, 181.6818246, 142.4036892, 122.3827137,
99.50427931, 60.49033956, 2.976064883, 42.91299583, 39.11274955,
32.67582669), .Dim = c(6L, 5L), .Dimnames = list(c("Artemisia_californica",
"Eschscholzia_californica", "Mimulus_aurantiacus", "Sisyrinchium_bellum",
"Sidalcea_malviflora", "Stipa_pulchra"), c("Growth_Rate", "Area",
"AreaVar", "SLA", "SLAVar")))
> dput(VegComm)
structure(list(Avena_fatua = c(0, 0.5, 0.333333333, 0.333333333,
1.916666667, 0.333333333, 2.25, 2.5, 2.25, 4.75, 4.75, 2.5, 1.583333333,
0, 2.25), Bromus_diandrus = c(1.25, 0, 4.083333333, 1.25, 0,
3, 4, 0, 0, 0, 1.75, 0, 1.25, 1.25, 2.083333333), Bromus_hordeaceus = c(0,
0.333333333, 0, 3.333333333, 8.833333333, 0, 0, 0, 1.5, 1.75,
0.666666667, 4, 0, 6.666666667, 0.333333333), Festuca_myuros = c(20.83333333,
14.16666667, 27.5, 10.41666667, 10.91666667, 6.25, 6.5, 5.083333333,
2.5, 7, 11.83333333, 21.66666667, 10.66666667, 9.916666667, 10.41666667
), Festuca_perennis = c(7.416666667, 2.25, 3.333333333, 15.83333333,
6.666666667, 2.25, 1.25, 1.25, 1.75, 2.083333333, 4.916666667,
4.666666667, 4.25, 8.25, 4.416666667), Carduus_pycnocephalus = c(0.5,
0, 6.083333333, 5.833333333, 0, 16, 13.75, 0.833333333, 0, 4.666666667,
0, 0.5, 5.75, 0, 1.25), Cirsium_vulgare = c(0, 0, 0, 0, 0.333333333,
0, 0, 0, 2.5, 0, 0, 0, 0, 0, 0), Erodium_cicutarium = c(0, 0,
0, 0, 0, 0, 4.166666667, 5.916666667, 0, 0, 0, 0, 0, 0, 0), Geranium_dissectum = c(21.25,
25, 15.83333333, 25.83333333, 15, 11.41666667, 10.5, 20.83333333,
22.91666667, 18.08333333, 15, 25.41666667, 12.5, 33.33333333,
23.33333333), Helminthotheca_echioides = c(0.333333333, 0, 1.75,
0, 1.25, 0.333333333, 0, 0, 0, 0, 2.833333333, 0, 0, 0, 0), Lactuca_serriola = c(0,
1.916666667, 2.416666667, 1.583333333, 1.75, 0, 0, 0, 0, 0, 1.25,
1.25, 1.5, 0, 0), Medicago_polymorpha = c(6.916666667, 30.41666667,
3.833333333, 10.75, 0, 3.833333333, 6.666666667, 16.66666667,
19.16666667, 4.25, 39.16666667, 7.083333333, 30, 5.833333333,
4.583333333), Oxalis_pes.caprae = c(0, 0, 0, 0, 0, 0, 0, 0, 2.916666667,
0, 0, 0, 0, 0, 0), Raphanus_sativus = c(4.916666667, 3.666666667,
6.666666667, 5.833333333, 3.333333333, 0.833333333, 4.5, 7.583333333,
18.33333333, 13.75, 0.666666667, 14.58333333, 0.333333333, 5.833333333,
5.083333333), Senecio_vulgaris = c(0, 0.833333333, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Sonchus_oleraceus = c(0, 0.833333333,
5.916666667, 1.25, 10.83333333, 1.166666667, 0, 1.333333333,
0, 0, 3.833333333, 0.833333333, 0.333333333, 2.083333333, 0),
Vicia_sativa = c(0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0,
0, 13.33333333), Artemisia_californica = c(4.583333333, 0,
1.25, 0, 0, 0, 1.583333333, 0, 2.916666667, 0, 0, 1.25, 3.833333333,
0, 12.66666667), Baccharis_pilularis = c(18.33333333, 17.91666667,
2.083333333, 2.083333333, 3.333333333, 12, 3.833333333, 4.5,
6.666666667, 0, 4.166666667, 1.25, 0, 1.25, 8.333333333),
Ericameria_ericoides = c(1.25, 0, 0, 0, 0.5, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), Mimulus_aurantiacus = c(0.833333333, 0,
2.5, 0, 0, 0.166666667, 4.166666667, 0, 1.25, 0, 0, 0, 0,
0, 0), Bromus_carinatus = c(0.5, 2.083333333, 5.416666667,
0, 4.666666667, 3.333333333, 4.166666667, 0.333333333, 5.5,
0, 0.833333333, 3.333333333, 2.083333333, 2.5, 0), Elymus_triticoides = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
Hordeum_brachyantherum = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), Stipa_pulchra = c(0, 0, 1.25,
0, 0.5, 0, 1.25, 1.75, 4.583333333, 0, 0, 1.25, 4.583333333,
0, 0), Achillea_millefolium = c(7.5, 3.333333333, 5, 3.416666667,
9.166666667, 49.16666667, 22.91666667, 25.41666667, 8.75,
34.33333333, 14.16666667, 4.166666667, 10, 3.75, 12), Eschscholzia_californica = c(1.25,
0, 0, 2.916666667, 0, 0, 1.25, 0, 0, 0, 0.666666667, 1, 0,
1.583333333, 0), Lupinus_variicolor = c(0, 0, 0, 0, 0, 0,
0, 0, 2.5, 0, 0, 0, 0, 0, 0), Echium_candicans = c(0, 0,
0, 0, 0.666666667, 0, 0, 0, 0, 0, 1.25, 0, 0, 0, 0)), .Names = c("Avena_fatua",
"Bromus_diandrus", "Bromus_hordeaceus", "Festuca_myuros", "Festuca_perennis",
"Carduus_pycnocephalus", "Cirsium_vulgare", "Erodium_cicutarium",
"Geranium_dissectum", "Helminthotheca_echioides", "Lactuca_serriola",
"Medicago_polymorpha", "Oxalis_pes.caprae", "Raphanus_sativus",
"Senecio_vulgaris", "Sonchus_oleraceus", "Vicia_sativa", "Artemisia_californica",
"Baccharis_pilularis", "Ericameria_ericoides", "Mimulus_aurantiacus",
"Bromus_carinatus", "Elymus_triticoides", "Hordeum_brachyantherum",
"Stipa_pulchra", "Achillea_millefolium", "Eschscholzia_californica",
"Lupinus_variicolor", "Echium_candicans"), row.names = c("PC1",
"PC2", "PC3", "PC4", "PC5", "PS1", "PS2", "PS3", "PS4", "PS5",
"PW1", "PW2", "PW3", "PW4", "PW5"), class = "data.frame")
> dput(nativetree)
structure(list(edge = structure(c(12L, 13L, 14L, 15L, 16L, 16L,
15L, 14L, 17L, 18L, 18L, 19L, 19L, 17L, 13L, 12L, 20L, 21L, 21L,
20L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 17L, 18L, 4L, 19L, 5L, 6L,
7L, 8L, 20L, 21L, 9L, 10L, 11L), .Dim = c(20L, 2L)), edge.length = c(7.629639,
22, 20.333344, 93.62796, 11.038696, 11.038696, 104.666656, 28.5,
62.899994, 33.600006, 16.800003, 16.800003, 16.800003, 96.5,
147, 41.985199, 51.760712, 60.883728, 60.883728, 112.64444),
Nnode = 10L, node.label = c("", "eudicots", "", "euasterids",
"", "eurosids", "mesopapilionoideaeclade", "lupinus", "",
""), tip.label = c("achillea_millefolium", "ericameria_ericoides",
"mimulus_aurantiacus", "hosackia_gracilis", "lupinus_nanus",
"lupinus_variicolor", "sidalcea_malviflora", "eschscholzia_californica",
"bromus_carinatus", "nassella_pulchra", "sisyrinchium_bellum"
), root.edge = 291.370361), .Names = c("edge", "edge.length",
"Nnode", "node.label", "tip.label", "root.edge"), class = "phylo", order = "cladewise")
The problem is that names of species do not match between STraits and nativetree.
intersect(row.names(STraits), nativetree$tip.label)
# character(0)
R is case-sensitive, so lower case names in the tree will not be recognised as identical to capitalised names in the data matrix. Also, the names of the species differ.
Once the names properly match, the traits need to be ordered as above:
traits <- STraits[nativetree$tip.label,]
and the phylogenetic signal calculated from the new traits table per column:
library(picante)
res = data.frame()
for(i in 1:ncol(traits)){
res[i, ] = phylosignal(x = traits[, i], phy = nativetree, reps = 999)
}
Note that I use the data you provided with dput, not the modifications implied with the script. Additionally, check ?phylosignal for syntax.
Continuous characters may be plotted on a phylogeny with the phytools package as shown here.

cumsum according to certain restricts in r

I have a large data of car accidents and a sample of it is provided below.
accident is a binary variable of whether the accident happens or
not.
shift_number is the number of the shift, 0 means the driver is
taking a rest and not a shift.
time_diff is the amount of time at each observation.
df <- data.frame(
accident = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1),
shift_number = c(1, 1, 0, 0, 0, 2, 2, 2, 0, 0, 3, 3, 3, 3, 3),
time_diff = 3:17
)
My question is to measure the total amount of working time since the driver starts this shift for each accident.
wanted <- data.frame
(
accident = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1),
shift_number = c(1, 1, 0, 0, 0, 2, 2, 2, 0, 0, 3, 3, 3, 3, 3),
time_diff = 3:17,
cum_time = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 75)
)
Does anyone have ideas on solving this problem with R? It's better to have data.table or vectorised solution because I've got huge data to deal with.
df$cum_time = 0
accident = which(df$accident == 1)
df$cum_time[accident] <- sapply(accident, function(x) {
sum(df$time_diff[(which.max(cumsum(df$shift_number[1:x] == 0)) + 1): x])
})
df
# accident shift_number time_diff cum_time
#1 0 1 3 0
#2 0 1 4 0
#3 0 0 5 0
#4 0 0 6 0
#5 0 0 7 0
#6 0 2 8 0
#7 0 2 9 0
#8 0 2 10 0
#9 0 0 11 0
#10 0 0 12 0
#11 0 3 13 0
#12 1 3 14 27
#13 0 3 15 0
#14 0 3 16 0
#15 1 3 17 75
We first make all the values in cum_time variable as 0. We find the indices where accident has occurred. For each of those indices we find the latest 0 in shift_number and calculate the sum of values of time_diff from the latest 0 to x and assign it to its respective indices.
Use the ave function to compute the cumulative sum of time_diff by shift_number:
cumsum_by_shift <- ave(df$time_diff, df$shift_number, FUN=cumsum)
#[1] 3 7 5 11 18 8 17 27 29 41 13 27 42 58 75
Pick out elements of cumsum_by_shift where accidents occur:
cum_time <- ifelse(df$accident == 1, cumsum_by_shift, 0)
#[1] 0 0 0 0 0 0 0 0 0 0 0 27 0 0 75
Note the use of the vectorized ifelse function.

Resources