Related
I have this kind of data:
> data_example
date A B C D E F
1 2020-09-22 1.3 0.0 1.3 0.3 0.9 0.0
2 2020-09-23 0.7 0.0 0.7 0.0 0.7 0.0
3 2020-09-24 0.4 0.0 0.4 0.0 0.4 0.0
4 2020-09-25 0.2 0.2 0.5 0.0 0.2 0.0
5 2020-09-26 1.0 0.0 1.0 0.0 1.0 0.0
6 2020-09-27 0.2 0.2 0.5 0.1 0.1 0.0
7 2020-09-28 0.6 0.1 0.7 0.0 0.6 0.0
8 2020-09-29 0.4 0.1 0.5 0.1 0.2 0.0
9 2020-09-30 0.4 0.1 0.6 0.0 0.4 0.0
10 2020-10-01 1.0 0.1 1.1 0.8 0.1 0.0
11 2020-10-02 0.6 0.1 0.8 0.2 0.4 0.0
I would like to plot more than one of the columns (A, B, C...) in the same time series plot BUT without using the add_trace. The reason is I am building a Shiny app where dynamically the user can choose, using the selectize input, which variables want to plot, so to do it dynamically it's a must to not to be in an add_trace way.
Is there another way to achieve that?
Thanks.
Edit:
Output of the dput(data_example)
data_example <- structure(list(date = c("2020-09-22", "2020-09-23", "2020-09-24",
"2020-09-25", "2020-09-26", "2020-09-27", "2020-09-28", "2020-09-29",
"2020-09-30", "2020-10-01", "2020-10-02"), A = c(1.3, 0.7, 0.4,
0.2, 1, 0.2, 0.6, 0.4, 0.4, 1, 0.6), B = c(0, 0, 0, 0.2, 0, 0.2,
0.1, 0.1, 0.1, 0.1, 0.1), C = c(1.3, 0.7, 0.4, 0.5, 1, 0.5, 0.7,
0.5, 0.6, 1.1, 0.8), D = c(0.3, 0, 0, 0, 0, 0.1, 0, 0.1, 0, 0.8,
0.2), E = c(0.9, 0.7, 0.4, 0.2, 1, 0.1, 0.6, 0.2, 0.4, 0.1, 0.4
), F = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-11L))
You should reshape your data.frame to long format.
I prefer library(data.table) for this - see the melt call. After that you may use split or color to generate the traces:
library(data.table)
library(plotly)
DF <- data.frame(
date = c("2020-09-22","2020-09-23","2020-09-24",
"2020-09-25","2020-09-26","2020-09-27","2020-09-28",
"2020-09-29","2020-09-30","2020-10-01","2020-10-02"),
A = c(1.3, 0.7, 0.4, 0.2, 1, 0.2, 0.6, 0.4, 0.4, 1, 0.6),
B = c(0, 0, 0, 0.2, 0, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1),
C = c(1.3, 0.7, 0.4, 0.5, 1, 0.5, 0.7, 0.5, 0.6, 1.1, 0.8),
D = c(0.3, 0, 0, 0, 0, 0.1, 0, 0.1, 0, 0.8, 0.2),
E = c(0.9, 0.7, 0.4, 0.2, 1, 0.1, 0.6, 0.2, 0.4, 0.1, 0.4),
F = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
)
setDT(DF)
longDF <- melt(DF, id.vars = "date")
plot_ly(longDF, type = "scatter", mode = "lines+markers", x = ~date, y = ~value, split = ~variable)
I am writing a function that uses a dataframe as filtering criteria for a big dataframe containing model outputs. These are the filtering criteria (as a df):
parameter value
1 alpha 0.1
2 beta 0.1
3 eta 0.1
4 zeta 0.1
5 lambda 0.5
6 phi 5.0
7 kappa 1.0
dput(values)
structure(list(parameter = structure(c(1L, 2L, 3L, 7L, 5L, 6L,
4L), .Label = c("alpha", "beta", "eta", "kappa", "lambda", "phi",
"zeta"), class = "factor"), value = c(0.1, 0.1, 0.1, 0.1, 0.5,
5, 1)), class = "data.frame", row.names = c(NA, -7L))
And this is how the 'outputs' df looks like:
time w x y z alpha beta eta zeta lambda phi kappa
1 0.0 10.00000 10.00000 10.000000 10.000000 0.1 0.1 0.1 0.1 0.95 5 1
1.1 0.1 10.00572 11.04680 9.896057 9.054394 0.1 0.1 0.1 0.1 0.95 5 1
1.2 0.2 10.01983 12.17827 9.592536 8.215338 0.1 0.1 0.1 0.1 0.95 5 1
1.3 0.3 10.04010 13.37290 9.112223 7.483799 0.1 0.1 0.1 0.1 0.95 5 1
1.4 0.4 10.06377 14.60353 8.489174 6.855626 0.1 0.1 0.1 0.1 0.95 5 1
1.5 0.5 10.08778 15.83982 7.764470 6.323152 0.1 0.1 0.1 0.1 0.95 5 1
dput(outputs)
structure(list(time = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 276.5, 276.6,
276.7, 276.8, 276.9, 276.961144437566), w = c(10, 10.0057192322758,
10.0198266325956, 10.040096099625, 10.0637654242843, 10.087779652849,
-1.71585943177118, -2.04004317987084, -2.56315700921588, -3.56775247519687,
-6.37643561014456, -13.828470036737), x = c(10, 11.0467963604334,
12.1782709261765, 13.3728962503142, 14.6035317074526, 15.8398164069251,
27.2774474452024, 26.3099862348669, 24.8705756934881, 22.3379071188018,
15.8960461541267, 3.62452931346518e-144), y = c(10, 9.89605687874935,
9.59253574727296, 9.11222320249057, 8.48917353431654, 7.76447036695841,
-0.604572230605542, -0.878231815857628, -1.46586965791714, -3.20623046085508,
-14.9365932475767, -3.30552834129368e+146), z = c(10, 9.05439359565339,
8.21533762023494, 7.48379901688836, 6.85562632179817, 6.3231517466183,
42.3149654949179, 43.8836626616462, 46.4372543252026, 51.7183454733949,
72.7027555440752, 3.30552834129368e+146), alpha = c(0.1, 0.1,
0.1, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5), beta = c(0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5), eta = c(0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5), zeta = c(0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9), lambda = c(0.9,
0.9, 0.5, 0.5, 0.9, 0.9, 0.5, 0.9, 0.5, 0.9, 0.5, 0.5
), phi = c(5, 5, 5, 5, 5, 5, 20, 20, 20, 20, 20, 20), kappa = c(1,
1, 1, 1, 1, 1, 10, 10, 10, 10, 10, 10), ode_outputs..iteration.. = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c("1",
"1.1", "1.2", "1.3", "1.4", "1.5", "2916.2765", "2916.2766",
"2916.2767", "2916.2768", "2916.2769", "2916.2770"), class = "data.frame")
So it should be something like:
filtered_outputs <- outputs %>% filter(all rows in column 1 == all values in column 2)
The names under the 'parameter' column correspond to column names in the 'outputs' df. I'd like this to be not hard-coded, so that I can feed in any filtering criteria as a df and the function will filter 'outputs'. I'd like to use dplyr or baseR preferably.
So you want to select all the rows in outputs dataframe which matches the values in values dataframe?
Here is a base R approach using sweep and rowSums.
result <- outputs[rowSums(sweep(outputs[as.character(values$parameter)], 2,
values$value, `!=`)) == 0, ]
result
# time w x y z alpha beta eta zeta lambda phi kappa
#1.2 0.2 10.01983 12.17827 9.592536 8.215338 0.1 0.1 0.1 0.1 0.5 5 1
#1.3 0.3 10.04010 13.37290 9.112223 7.483799 0.1 0.1 0.1 0.1 0.5 5 1
# ode_outputs..iteration..
#1.2 NA
#1.3 NA
A possible dplyr and tidyr solution:
Create a helper data frame by turning the values data frame into wide format, and apply a semi-join to filter by the required conditions.
You could easily wrap this up in one continuous workflow but I think it's easier to understand in separate steps.
library(dplyr)
library(tidyr)
conditions <-
values %>%
pivot_wider(names_from = parameter, values_from = value)
outputs %>%
semi_join(conditions)
#> Joining, by = c("alpha", "beta", "eta", "zeta", "lambda", "phi", "kappa")
#> time w x y z alpha beta eta zeta lambda phi
#> 1.2 0.2 10.01983 12.17827 9.592536 8.215338 0.1 0.1 0.1 0.1 0.5 5
#> 1.3 0.3 10.04010 13.37290 9.112223 7.483799 0.1 0.1 0.1 0.1 0.5 5
#> kappa ode_outputs..iteration..
#> 1.2 1 NA
#> 1.3 1 NA
Created on 2021-07-08 by the reprex package (v2.0.0)
I often find these kind of things are easier when the data is in long-form format - although this is just preference:
outputs %>%
tidyr::pivot_longer(
cols = -c(time, w, x, y, z, ode_outputs..iteration..),
names_to="parameter", values_to="value_truth"
) %>%
dplyr::left_join(filter_df) %>%
dplyr::group_by(time) %>%
dplyr::filter(all(value == value_truth)) %>%
dplyr::select(-value) %>%
tidyr::pivot_wider(
names_from="parameter",
values_from="value_truth"
)
Output:
# A tibble: 2 x 13
# Groups: time [2]
time w x y z ode_outputs..iteration.. alpha beta eta zeta lambda phi kappa
<dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.2 10.0 12.2 9.59 8.22 NA 0.1 0.1 0.1 0.1 0.5 5 1
2 0.3 10.0 13.4 9.11 7.48 NA 0.1 0.1 0.1 0.1 0.5 5 1
Data:
outputs = structure(list(time = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 276.5, 276.6,
276.7, 276.8, 276.9, 276.961144437566), w = c(10, 10.0057192322758,
10.0198266325956, 10.040096099625, 10.0637654242843, 10.087779652849,
-1.71585943177118, -2.04004317987084, -2.56315700921588, -3.56775247519687,
-6.37643561014456, -13.828470036737), x = c(10, 11.0467963604334,
12.1782709261765, 13.3728962503142, 14.6035317074526, 15.8398164069251,
27.2774474452024, 26.3099862348669, 24.8705756934881, 22.3379071188018,
15.8960461541267, 3.62452931346518e-144), y = c(10, 9.89605687874935,
9.59253574727296, 9.11222320249057, 8.48917353431654, 7.76447036695841,
-0.604572230605542, -0.878231815857628, -1.46586965791714, -3.20623046085508,
-14.9365932475767, -3.30552834129368e+146), z = c(10, 9.05439359565339,
8.21533762023494, 7.48379901688836, 6.85562632179817, 6.3231517466183,
42.3149654949179, 43.8836626616462, 46.4372543252026, 51.7183454733949,
72.7027555440752, 3.30552834129368e+146), alpha = c(0.1, 0.1,
0.1, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5), beta = c(0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5), eta = c(0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5), zeta = c(0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9), lambda = c(0.9,
0.9, 0.5, 0.5, 0.9, 0.9, 0.5, 0.9, 0.5, 0.9, 0.5, 0.5
), phi = c(5, 5, 5, 5, 5, 5, 20, 20, 20, 20, 20, 20), kappa = c(1,
1, 1, 1, 1, 1, 10, 10, 10, 10, 10, 10), ode_outputs..iteration.. = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c("1",
"1.1", "1.2", "1.3", "1.4", "1.5", "2916.2765", "2916.2766",
"2916.2767", "2916.2768", "2916.2769", "2916.2770"), class = "data.frame")
filter_df = fread(' parameter value
1 alpha 0.1
2 beta 0.1
3 eta 0.1
4 zeta 0.1
5 lambda 0.5
6 phi 5.0
7 kappa 1.0') %>% dplyr::select(-V1)
Here is my sample data;
mydata<-structure(list(x1 = c(0, 8.6, 11.2, 8.4, 0, 0), x2 = c(0, 0,
7.8, 7.6, 1.2, 10.2), y1 = c(0, 0, 3.4, 21.4, 1.8, 1.4), y2 = c(7.8,
7.6, 1.2, 10.2, 7, 0), z1 = c(0, 1.6, 7.6, 23.6, 3.2, 0), z2 = c(8.6,
1.4, 0, 0, 0, 0)), .Names = c("x1", "x2", "y1", "y2", "z1", "z2"
), class = "data.frame", row.names = c(NA, -6L))
x1 x2 y1 y2 z1 z2
1 0.0 0.0 0.0 7.8 0.0 8.6
2 8.6 0.0 0.0 7.6 1.6 1.4
3 11.2 7.8 3.4 1.2 7.6 0.0
4 8.4 7.6 21.4 10.2 23.6 0.0
5 0.0 1.2 1.8 7.0 3.2 0.0
6 0.0 10.2 1.4 0.0 0.0 0.0
With the code below, it is possible to group columns as x, y and z.
grps <- unique(gsub("[0-9]", "", colnames(mydata)))
# [1] "x" "y" "z"
But When I rename columns like that;
myd<-structure(list(X2005 = c(0, 8.6, 11.2, 8.4, 0, 0), X2005.1 = c(0,
0, 7.8, 7.6, 1.2, 10.2), X2006 = c(0, 0, 3.4, 21.4, 1.8, 1.4),
X2006.1 = c(7.8, 7.6, 1.2, 10.2, 7, 0), X2007 = c(0, 1.6,
7.6, 23.6, 3.2, 0), X2007.1 = c(8.6, 1.4, 0, 0, 0, 0)), .Names = c("X2005",
"X2005.1", "X2006", "X2006.1", "X2007", "X2007.1"), row.names = c(NA,
6L), class = "data.frame")
X2005 X2005.1 X2006 X2006.1 X2007 X2007.1
1 0.0 0.0 0.0 7.8 0.0 8.6
2 8.6 0.0 0.0 7.6 1.6 1.4
3 11.2 7.8 3.4 1.2 7.6 0.0
4 8.4 7.6 21.4 10.2 23.6 0.0
5 0.0 1.2 1.8 7.0 3.2 0.0
6 0.0 10.2 1.4 0.0 0.0 0.0
I want to see;
# [1] "2005" "2006" "2007"
We can use gsub to match the letter 'X' at the beginning (^) of the string or (| the . followed by numbers at the end ($) of the string and replace with blank ("")
names(myd) <- gsub("^X|\\.\\d+$", "", names(myd))
names(myd)
#[1] "2005" "2005" "2006" "2006" "2007" "2007"
unique(names(myd))
#[1] "2005" "2006" "2007"
If we know the number of digits and position, then substr would be faster
substr(names(myd), 2, 5)
One option would be to to use sub and convert the names to factor with labels as needed.
names(mydata) <- factor(sub("[0-9]", "", names(mydata)), labels = 2005:2007)
and then check your column names
names(mydata)
#[1] "2005" "2005" "2006" "2006" "2007" "2007"
I'd like to create a new column where each value is a random subset of other values from that row in my data.
# Example data:
df <- data.frame(matrix(nrow = 57, ncol = 6)) %>%
mutate(
X1 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X2 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X3 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X4 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X5 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X6 = round(rnorm(n = 57, mean = 0, sd = 1), 1)
)
# my failed attempt at a new column
df %>%
rowwise() %>%
mutate(X7 = str_c(df[, sample(1:6, 3, replace = F)]), sep = ", ")
A solution uses tidyverse. The key is to split the data frame by row and apply a function to sample the values for each row subset. map_df can achieve the above-mentioned task and combine all the output to a data frame. df2 is the final output.
# Load package
library(tidyverse)
# Set seed
set.seed(123)
# Create example data frame
df <- data.frame(matrix(nrow = 57, ncol = 6)) %>%
mutate(
X1 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X2 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X3 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X4 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X5 = round(rnorm(n = 57, mean = 0, sd = 1), 1),
X6 = round(rnorm(n = 57, mean = 0, sd = 1), 1)
)
# Process the data
df2 <- df %>%
rowid_to_column() %>%
split(f = .$rowid) %>%
map_df(function(dt){
dt_sub <- dt %>%
select(-rowid) %>%
select(sample(1:6, 3, replace = FALSE)) %>%
unite(X7, everything(), sep = ", ")
return(dt_sub)
}) %>%
bind_cols(df) %>%
select(paste0("X", 1:7))
df2
X1 X2 X3 X4 X5 X6 X7
1 -0.6 0.6 0.5 0.1 0.9 0.1 0.1, 0.5, 0.9
2 -0.2 0.1 0.3 0.0 -1.0 0.2 0.1, 0.3, 0.2
3 1.6 0.2 0.1 2.1 2.0 1.6 1.6, 2.1, 0.1
4 0.1 0.4 -0.6 -0.7 -0.1 -0.2 0.1, 0.4, -0.6
5 0.1 -0.5 -0.8 -1.1 0.2 0.2 0.1, 0.2, -0.5
6 1.7 -0.3 -1.0 0.0 -0.7 1.2 -1, -0.7, -0.3
7 0.5 -1.0 0.1 0.3 -0.6 1.1 0.5, -0.6, -1
...
I believe that the best way is to use base R functions replicate, sample and sapply.
inx <- t(replicate(nrow(df), sample(1:6, 3, replace = F)))
df$X7 <- sapply(seq_len(nrow(df)), function(i)
paste(df[i, inx[i, ]], collapse = ", "))
This is a solution in dplyr:
library(dplyr)
df %>%
group_by(idx = seq(n())) %>%
do({
res <- select(., -idx)
bind_cols(res, X7 = toString(sample(unlist(res),
3, replace = FALSE)))
}) %>%
ungroup() %>%
select(-idx)
The result:
# A tibble: 57 x 7
X1 X2 X3 X4 X5 X6 X7
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 0.4 0.4 -0.1 3.4 0.9 -0.4 0.4, 0.9, 0.4
2 1.5 0.9 -0.7 1.5 -1.1 -0.3 -0.7, 1.5, -1.1
3 -0.1 -0.5 -0.6 -0.8 -0.3 2.3 -0.3, 2.3, -0.8
4 0.7 -1.0 0.3 0.2 -0.5 -0.3 -1, 0.3, -0.3
5 0.6 0.9 0.4 1.9 -0.7 -2.0 0.4, -2, 0.9
6 0.3 0.7 1.3 0.6 1.3 -0.2 0.7, -0.2, 1.3
7 0.5 0.3 1.1 -0.2 -0.4 -0.8 0.5, 1.1, 0.3
8 0.4 -1.9 0.8 -0.6 -1.1 0.4 0.4, -1.9, -0.6
9 0.2 -1.5 -1.9 1.0 0.0 0.6 0, 1, 0.6
10 -0.2 0.7 -0.5 1.4 0.3 -0.1 -0.2, 0.3, -0.5
I have a dataframe with measurements stored as a list by row.
Subject Measurements
1 s1 -0.4, -0.9, -1.1, -0.1, 0.1
2 s2 -1.4, -1.7, -1.7, -0.6, -1.7
3 s3 -1.0, -0.1, -0.6, -0.5, -0.1
4 s4 -0.2, -0.5, -0.2, 0.1, -0.7
5 s5 0.7, 0.2, 0.4, 0.7, 0.2
6 s6 -0.3, -0.1, 0.1, -0.2, -0.1
How do I average/find standard deviation/other list manipulations and add the output to a new column in data frame (e.g "mean")
Edit
Here's the data structure I'm working with:
structure(list(Subject = structure(1:6, .Label = c("s1", "s2",
"s3", "s4", "s5", "s6"), class = "factor"), Measurements = list(
c(-0.4, -0.9, -1.1, -0.1, 0.1), c(-1.4, -1.7, -1.7, -0.6,
-1.7), c(-1, -0.1, -0.6, -0.5, -0.1), c(-0.2, -0.5, -0.2,
0.1, -0.7), c(0.7, 0.2, 0.4, 0.7, 0.2), c(-0.3, -0.1, 0.1,
-0.2, -0.1))), .Names = c("Subject", "Measurements"), row.names = c(NA,
6L), class = "data.frame")
If you store your data more efficiently, this becomes much easier:
dat<- structure(list(Subject = structure(1:6, .Label = c("s1", "s2",
"s3", "s4", "s5", "s6"), class = "factor"), Measurements = list(
c(-0.4, -0.9, -1.1, -0.1, 0.1), c(-1.4, -1.7, -1.7, -0.6,
-1.7), c(-1, -0.1, -0.6, -0.5, -0.1), c(-0.2, -0.5, -0.2,
0.1, -0.7), c(0.7, 0.2, 0.4, 0.7, 0.2), c(-0.3, -0.1, 0.1,
-0.2, -0.1))), .Names = c("Subject", "Measurements"), row.names = c(NA,
6L), class = "data.frame")
> dat <- data.frame(subject = dat$Subject,do.call(rbind,dat$Meas))
> dat$means <- apply(dat[,-1],1,mean)
> dat
subject X1 X2 X3 X4 X5 means
1 s1 -0.4 -0.9 -1.1 -0.1 0.1 -0.48
2 s2 -1.4 -1.7 -1.7 -0.6 -1.7 -1.42
3 s3 -1.0 -0.1 -0.6 -0.5 -0.1 -0.46
4 s4 -0.2 -0.5 -0.2 0.1 -0.7 -0.30
5 s5 0.7 0.2 0.4 0.7 0.2 0.44
6 s6 -0.3 -0.1 0.1 -0.2 -0.1 -0.12
Once you have each measurement in its own column, you can simply use apply (or rowMeans) os some similar function.
It looks like Measurements is a matrix within your data.frame (df).
df$means <- rowMeans(df$Measurements)
For a more general solution you can use apply with Margin = 1 for a given function.
df$SDs <- apply(df$Measurements, 1, sd)
If Measurements were actually a genuine list you'd use
df$SDs <- lapply(df$Measurements, sd)
That gives maximum performance but now your SDs column is a list so to make it a vector I'd go with...
df$SDs <- sapply(df$Measurements, sd)
(when I made a data.frame with a list included it didn't look like that so I didn't think it was really a list at first).