I have time t, speed v and position x of a vehicle along with other informations: info1, info2, status
How can I extend the tibble by linearly extrapolating the position based on last speed (v = 14) for given timestamps. So the added rows are copy of the last row except with the predicted positions and status changed to "predicted"`.
Is it possible to do it without using loop.
tbl <- tibble(info1 = rep("a", 3),
info2 = rep("b", 3),
t = c(1, 2, 3),
v = c(12, 13, 14),
x = c(12, 24, 37),
status = rep("real", 3))
timestamps <- c(4, 5, 6, 8) # timestamps does not linearly increase
# desired output
tbl_desired <- tibble(info1 = rep("a", 7),
info2 = rep("b", 7),
t = c(1, 2, 3, 4, 5, 6, 8),
v = c(12, 13, 14, 14, 14, 14, 14),
x = c(12, 24, 37, 51, 65, 79, 107),
status = c(rep("real", 3), rep("predicted", 4)))
The conditions are not clear especially the predicted values in 'x'. Below code works in the following way
Extracts the last row (slice_tail, n = 1)
update the columns 't', 'v', 'x' (summarise)
Bind the rows from the original data (bind_rows)
library(dplyr)
tbl %>%
slice_tail(n = 1) %>%
summarise(info1, info2, t = timestamps, v = v,
x = (x + cumsum(c(1, diff(t)) *
rep(last(v), length(t)))), status = 'predicted') %>%
bind_rows(tbl, .)
-output
# A tibble: 7 × 6
info1 info2 t v x status
<chr> <chr> <dbl> <dbl> <dbl> <chr>
1 a b 1 12 12 real
2 a b 2 13 24 real
3 a b 3 14 37 real
4 a b 4 14 51 predicted
5 a b 5 14 65 predicted
6 a b 6 14 79 predicted
7 a b 8 14 107 predicted
If there are many columns, after sliceing the last row, use mutate to update only the columns that needs to be changed and wrap in a list whereever the length is greater than 1, then unnest the list column
library(tidyr)
tbl %>%
slice_tail(n = 1) %>%
mutate(t = list(timestamps), v = v,
x = list((x + cumsum(c(1, diff(timestamps)) *
rep(last(v), length(timestamps))))), status = 'predicted') %>%
unnest(where(is.list)) %>%
bind_rows(tbl, .)
-output
# A tibble: 7 × 6
info1 info2 t v x status
<chr> <chr> <dbl> <dbl> <dbl> <chr>
1 a b 1 12 12 real
2 a b 2 13 24 real
3 a b 3 14 37 real
4 a b 4 14 51 predicted
5 a b 5 14 65 predicted
6 a b 6 14 79 predicted
7 a b 8 14 107 predicted
Or use add_row and then fill the NA rows with previous non-NA for those columns not specified in the add_row
library(tibble)
tbl %>%
add_row(t = timestamps, v = last(.$v),
x = (last(.$x) + cumsum(c(1, diff(timestamps)) *
rep(last(.$v), length(timestamps)))), status = 'predicted') %>%
fill(everything())
-output
# A tibble: 7 × 6
info1 info2 t v x status
<chr> <chr> <dbl> <dbl> <dbl> <chr>
1 a b 1 12 12 real
2 a b 2 13 24 real
3 a b 3 14 37 real
4 a b 4 14 51 predicted
5 a b 5 14 65 predicted
6 a b 6 14 79 predicted
7 a b 8 14 107 predicted
Related
I am looking to generate a longitudinal dataset. I have generated my pat numbers and treatment groups:
library(dplyr)
set.seed(420)
Pat_TNO <- 1001:1618
data.frame(Pat_TNO = Pat_TNO) %>%
rowwise() %>%
mutate(
trt = rbinom(1, 1, 0.5)
)
My timepoints (in days) are:
timepoint_weeks <- c(seq(2, 12, 2), 16, 20, 24, 52)
timepoint_days <- 7 * timepoint_weeks
How can I pivot this dataset using the vector timepoint_days, so I have 10 rows per participant and column names Pat_TNO, trt, timepoint_days.
You can use the unnest function from tidyr to achieve what you want.
Here is the code
library(dplyr)
library(tidyr)
set.seed(420)
Pat_TNO <- 1001:1618
x <- data.frame(Pat_TNO = Pat_TNO) %>%
rowwise() %>%
mutate(
trt = rbinom(1, 1, 0.5)
)
timepoint_weeks <- c(seq(2, 12, 2), 16, 20, 24, 52)
timepoint_days <- 7 * timepoint_weeks
x %>%
mutate(timepoint_days = list(timepoint_days)) %>%
unnest()
Output
# A tibble: 6,180 × 3
Pat_TNO trt timepoint_days
<int> <int> <dbl>
1 1001 1 14
2 1001 1 28
3 1001 1 42
4 1001 1 56
5 1001 1 70
6 1001 1 84
7 1001 1 112
8 1001 1 140
9 1001 1 168
10 1001 1 364
# … with 6,170 more rows
Here I used the mutate function to add a column with a list containing timepoint_days in every row. And then unnest collapses each row to get 10 rows per participant.
I have a dateframe like this:
df <- data.frame(grp = c(rep("a", 5), rep("b", 5)), t = c(1:5, 1:5), value = c(-1, 5, 9, -15, 6, 5, 1, 7, -11, 9))
# Limits for desired cumulative sum (CumSum)
maxCumSum <- 8
minCumSum <- 0
What I would like to calculate is a cumulative sum of value by group (grp) within the values of maxCumSum and minCumSum. The respective table dt2 should look something like this:
grp t value CumSum
a 1 -1 0
a 2 5 5
a 3 9 8
a 4 -15 0
a 5 6 6
b 1 5 5
b 2 1 6
b 3 7 8
b 4 -11 0
b 5 9 8
Think of CumSum as a water storage with has a certain maximum capacity and the level of which cannot sink below zero.
The normal cumsum does obviously not do the trick since there are no limitations to maximum or minimum. Has anyone a suggestion how to achieve this? In the real dataframe there are of course more than 2 groups and far more than 5 times.
Many thanks!
What you can do is create a function which calculate the cumsum until it reach the max value and start again at the min value like this:
df <- data.frame(grp = c(rep("a", 5), rep("b", 5)), t = c(1:5, 1:5), value = c(-1, 5, 9, -15, 6, 5, 1, 7, -11, 9))
library(dplyr)
maxCumSum <- 8
minCumSum <- 0
f <- function(x, y) max(min(x + y, maxCumSum), minCumSum)
df %>%
group_by(grp) %>%
mutate(CumSum = Reduce(f, value, 0, accumulate = TRUE)[-1])
#> # A tibble: 10 × 4
#> # Groups: grp [2]
#> grp t value CumSum
#> <chr> <int> <dbl> <dbl>
#> 1 a 1 -1 0
#> 2 a 2 5 5
#> 3 a 3 9 8
#> 4 a 4 -15 0
#> 5 a 5 6 6
#> 6 b 1 5 5
#> 7 b 2 1 6
#> 8 b 3 7 8
#> 9 b 4 -11 0
#> 10 b 5 9 8
Created on 2022-07-04 by the reprex package (v2.0.1)
I am new to R. I want to know which row from the seq meets the criteria (is >= 10).
df1 <- data.frame(value = c(1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
aa = c('A','B','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','Y'))
seq <- data.frame(aa = c("ABC","BCD","CDE","DEF","EFG"))
counter = 0;
y <- NULL;
for (i in seq$aa) {
df_splited<- strsplit(i, "") #split aa
df_new <- do.call(cbind, df_splited)
df_2 <- as.data.frame(df_new)
counter = counter + 1
new_df <- df_2 %>% #match two dfs
pivot_longer(cols = everything(), values_to = 'aa') %>%
left_join(df1, by = 'aa')
new_df$value <- sum(new_df) #summarise value
res <- new_df %>% filter(value >= 10) #if value >= 10 print it out
y <- rbind(y, res)
print(y)
}
Expected outcome for res:
value row_n
12 3
15 4
18 5
I don't know how to add row_n to the outcome to the for loop.
Let's take the first example ("ABC"):
from df1 we know that: A = 1 ; B = 2 ; C = 3
the sum of ABC = 6
since sum for ABC is not equal or greater than 10 it will be ignored for the final output.
For "EFG" the sum is 18 so it will be included in the final output.
Then I would like to know that EFG was the one that met the criteria so ideally the output would be a row from the seq that met criteria.
You can sum the values of the strings in seq like this:
seq$val <- sapply(strsplit(seq$aa, ""), function(x) sum(df1$value[match(x, df1$aa)]))
So that seq becomes:
seq
#> aa val
#> 1 ABC 6
#> 2 BCD 9
#> 3 CDE 12
#> 4 DEF 15
#> 5 EFG 18
And you can do
seq[seq$val > 10,]
#> aa val
#> 3 CDE 12
#> 4 DEF 15
#> 5 EFG 18
a (bit more verbose) tidyverse approach (assuming you only want to calcualte the sum for a three-letter combination):
seq %>%
mutate(split = str_split(aa, "")) %>%
unnest_wider(split, names_repair = ~c("aa", paste0("letter_", 1:3))) %>%
mutate(across(starts_with("letter_"), match, df1$aa),
letter_sum = apply(across(starts_with("letter_")), 1, sum))
also gives:
# A tibble: 5 x 5
aa letter_1 letter_2 letter_3 letter_sum
<chr> <int> <int> <int> <int>
1 ABC 1 2 3 6
2 BCD 2 3 4 9
3 CDE 3 4 5 12
4 DEF 4 5 6 15
5 EFG 5 6 7 18
I have some tidy data, and one of the group is a blank:
df <- data.frame(Group = c(rep(LETTERS[1:3], 3), "Blank", "Blank", "Blank"),
ID = rep(1:3, 4),
Value = c(10, 11, 12, 21, 22, 23, 31, 32, 33, 1, 2, 3))
df
Group ID Value
1 A 1 10
2 B 2 11
3 C 3 12
4 A 1 21
5 B 2 22
6 C 3 23
7 A 1 31
8 B 2 32
9 C 3 33
10 Blank 1 1
11 Blank 2 2
12 Blank 3 3
I wanted to subtract Blank from each group (A, B, C), so the normalized data will look like that:
df_normalized<- data.frame(Group = rep(LETTERS[1:3], 3),
ID = rep(1:3, 3),
Value = c(9, 9, 9, 20, 20, 20, 30, 30, 30))
df_normalized
Group ID Value
1 A 1 9
2 B 2 9
3 C 3 9
4 A 1 20
5 B 2 20
6 C 3 20
7 A 1 30
8 B 2 30
9 C 3 30
How to do it nicely using dplyr?
EDIT:
How to do that for multiple groups? e.g:
df <- data.frame(Cluster = c(rep("C1", 12), rep("C2", 12)),
Group = rep(c(rep(LETTERS[1:3], 3), "Blank", "Blank", "Blank"), 2),
ID = rep(1:3, 8),
Value = sample(24))
Assuming you'll have only one "Blank" value per ID as shown in the example, you can do
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Value = Value - Value[Group == "Blank"]) %>%
filter(Group != "Blank")
# Group ID Value
# <fct> <int> <dbl>
#1 A 1 9
#2 B 2 9
#3 C 3 9
#4 A 1 20
#5 B 2 20
#6 C 3 20
#7 A 1 30
#8 B 2 30
#9 C 3 30
If you have more than one "Blank" you can use match which would ensure that only the first value is selected.
df %>%
group_by(ID) %>%
mutate(Value = Value - Value[match("Blank", Group)]) %>%
filter(Group != "Blank")
Given a dataframe I want to run on multiple column names, calculate something and add the output as new column. The next calculation will be added as a new column to the updated dataframe.
For example:
Given a simple df:
df <- structure(list(a = c(1, 2, 3), b = c(4, 5, 6), c = c(7, 8, 9),
d = c(10, 11, 12)), .Names = c("a", "b", "c", "d"), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
For each column a, b, c, d I want to calculate, say a square:
a2 = a^2
b2 = b^2 ...
For technical reasons I can't publish the whole dataframe but I am going to pass a column name each time and expect the function to mutate a new column (for example a2) next time when I will add b2, a2 will be already there:
If I would use for loop it would look like:
for (x in column_names) {
df <- df %>% mutate("x2" = x^2)
}
So each time my df updates with new calculated column.
Please advise how can I do this without for loop with functional programming.
I am trying to do this with map, lapply but I have the problem that my df doesn't get updated each iteration.
Is this the function you are looking for?
add_x2 <- function(df, x) {
df[paste0(x, "2")] <- df[x]^2
df
}
df %>%
add_x2(c("a", "b"))
# A tibble: 3 x 6
a b c d a2 b2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 7 10 1 16
2 2 5 8 11 4 25
3 3 6 9 12 9 36
With tidyverse:
df %>%
+ mutate_if(is.numeric,funs(.^2))
# A tibble: 3 x 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 16 49 100
2 4 25 64 121
3 9 36 81 144
or
df %>% mutate_all(funs(.^2))
# A tibble: 3 x 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 16 49 100
2 4 25 64 121
3 9 36 81 144