I have a dataset that has some ID and associated timepoints. I want to filter out IDs that have a specific combination of timepoints. If I filter using %in% or |, I get IDs out of the specific combination. How do I do this in R ?
ID
Timepoint
1
1
1
6
1
12
2
1
3
1
3
6
3
12
3
18
4
1
4
6
4
12
I want to filter IDs that have timepoints 1,6 and 12 and exclude other IDs.
Result would be IDs 1,3 and 4
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4),
Timepoint = c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12))
df %>%
filter(Timepoint %in% c(1, 6, 12)) %>%
mutate(indicator = 1) %>%
group_by(ID) %>%
complete(Timepoint = c(1, 6, 12)) %>%
filter(!ID %in% pull(filter(., is.na(indicator)), ID)) %>%
select(indicator)
Output:
# A tibble: 9 × 2
# Groups: ID [3]
ID indicator
<dbl> <dbl>
1 1 1
2 1 1
3 1 1
4 3 1
5 3 1
6 3 1
7 4 1
8 4 1
9 4 1
We can use
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(c(1, 6, 12) %in% Timepoint)) %>%
ungroup
-output
# A tibble: 10 x 2
ID Timepoint
<dbl> <dbl>
1 1 1
2 1 6
3 1 12
4 3 1
5 3 6
6 3 12
7 3 18
8 4 1
9 4 6
10 4 12
From your data, ID 2 has time point 1. So if filter by time points 1, 6, 12, the result will be 1, 2, 3, 4 instead of 1, 3, 4.
ids <- c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4)
time_points <- c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12)
dat <- data.frame(ids, time_points)
unique(dat$ids[dat$time_points %in% c(1, 6, 12)])
Related
I would like to reshape the data sample below, so that to get the output like in the table. How can I reach to that? the idea is to split the column e into two columns according to the disease. Those with disease 0 in one column and those with disease 1 in the other column. thanks in advance.
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), fid = c(1,
1, 2, 2, 3, 3, 4, 4, 5, 5), disease = c(0, 1, 0, 1, 1, 0, 1, 0, 0,
1), e = c(3, 2, 6, 1, 2, 5, 2, 3, 1, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
library(tidyverse)
df %>%
pivot_wider(fid, names_from = disease, values_from = e, names_prefix = 'e') %>%
select(-fid)
e0 e1
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1
if you want the e1,e2 you could do:
df %>%
pivot_wider(fid, names_from = disease, values_from = e,
names_glue = 'e{disease + 1}') %>%
select(-fid)
# A tibble: 5 x 2
e1 e2
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1
We could use lead() combined with ìfelse statements for this:
library(dplyr)
df %>%
mutate(e2 = lead(e)) %>%
filter(row_number() %% 2 == 1) %>%
mutate(e1 = ifelse(disease==1, e2,e),
e2 = ifelse(disease==0, e2,e)) %>%
select(e1, e2)
e1 e2
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1
I have a dataframe that looks like this, but there will be many more IDs:
# Groups: ID [1]
ID ARS stim
<int> <int> <chr>
1 3 0 1
2 3 4 2
3 3 2 3
4 3 3 4
5 3 1 5
6 3 0 6
7 3 2 10
8 3 4 11
9 3 0 12
10 3 3 13
11 3 2 14
12 3 2 15
I would like to calculate the sum of the absolute difference abs() between the values in ARS, e.g. for stim=1 and stim=10 plus for stim=2 and stim=11 and so on.
Any good solutions are appreciated!
The desired output calculation is:
abs(0-2) + abs(4-4) + abs(2-0) + abs(3-3) + abs(1-2) + abs(0-2)
Hence, 2+0+2+0+1+2
Output for ID==3: 7
A possible solution:
library(dplyr)
df <- structure(list(ID = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), ARS = c(0, 4, 2, 3, 1, 0, 2, 4, 0, 3, 2, 2), stim = c(1, 2, 3, 4, 5, 6,
10, 11, 12, 13, 14, 15)), row.names = c(NA, -12L), class = "data.frame")
df %>%
group_by(ID) %>%
summarise(value = abs(ARS[which(stim == 1:6)] - ARS[which(stim == 9+1:6)]),
.groups = "drop") %>%
pull(value) %>% sum
#> [1] 7
Let's say I have the following dataframe:
dat <- tribble(
~V1, ~V2,
2, -3,
3, 2,
1, 3,
3, -4,
5, 1,
3, 2,
1, -4,
3, 4,
4, 1,
3, -5,
4, 2,
3, 4
)
How can I replace negative values with NA using na_if()? I know how to do this using ifelse, but don't manage to come up with a correct condition for na_if():
> dat %>%
+ mutate(V2 = ifelse(V2 < 0, NA, V2))
# A tibble: 12 x 2
V1 V2
<dbl> <dbl>
1 2 NA
2 3 2
3 1 3
4 3 NA
5 5 1
6 3 2
7 1 NA
8 3 4
9 4 1
10 3 NA
11 4 2
12 3 4
I'm trying to compute ICC values for each subject for the table below, but group_by() is not working as I think it should.
SubID Rate1 Rate2
1 1 2 5
2 1 2 4
3 1 2 5
4 2 3 4
5 2 4 1
6 2 5 1
7 2 2 2
8 3 2 5
9 3 3 5
The code I am running is as follows:
df %>%
group_by(SubID) %>%
summarise(icc = DescTools::ICC(.)$results[3, 2])
and the output:
# A tibble: 3 x 2
SubID icc
<dbl> <dbl>
1 1 -0.247
2 2 -0.247
3 3 -0.247
It seems that summarise is not being applied according to groups, but to the entire dataset. I'm not sure what is going on.
dput()
structure(list(SubID = c(1, 1, 1, 2, 2, 2, 2, 3, 3), Rate1 = c(2,
2, 2, 3, 4, 5, 2, 2, 3), Rate2 = c(5, 4, 5, 4, 1, 1, 2, 5, 5)), class = "data.frame", row.names = c(NA,
-9L))
Not terribly familiar with library(DescTools) but here is a potential solution that utilizes a nest() / map() combo:
library(DescTools)
library(tidyverse)
df <- structure(
list(SubID = c(1, 1, 1, 2, 2, 2, 2, 3, 3),
Rate1 = c(2, 2, 2, 3, 4, 5, 2, 2, 3),
Rate2 = c(5, 4, 5, 4, 1, 1, 2, 5, 5)),
class = "data.frame", row.names = c(NA, -9L)
)
df %>%
nest(ICC3 = -SubID) %>%
mutate(ICC3 = map_dbl(ICC3, ~ ICC(.x)[["results"]] %>%
filter(type == "ICC3") %>%
pull(est)))
#> # A tibble: 3 x 2
#> SubID ICC3
#> <dbl> <dbl>
#> 1 1 2.83e-15
#> 2 2 -5.45e- 1
#> 3 3 -6.66e-16
Created on 2021-03-08 by the reprex package (v0.3.0)
I am trying to compute a balance column.
So, to show an example, I want to go from this:
df <- data.frame(group = c("A", "A", "A", "A", "A"),
start = c(5, 0, 0, 0, 0),
receipt = c(1, 5, 6, 4, 6),
out = c(4, 5, 3, 2, 5))
> df
group start receipt out
1 A 5 1 4
2 A 0 5 5
3 A 0 6 3
4 A 0 4 2
5 A 0 6 5
to creating a new balance column like the following
> dfb
group start receipt out balance
1 A 5 1 4 2
2 A 0 5 5 2
3 A 0 6 3 5
4 A 0 4 2 7
5 A 0 6 5 8
I tried the following attempt but it isn't working
dfc <- df %>%
group_by(group) %>%
mutate(balance = if_else(row_number() == 1, start + receipt - out, (lag(balance) + receipt) - out)) %>%
ungroup()
Would really appreciate some help with this. Thanks!
You could use cumsum from dplyr. Note: I had to change your initial df table to match the one in your required result because you have different data in "out".
df <- data.frame(group = c("A", "A", "A", "A", "A"),
start = c(5, 0, 0, 0, 0),
receipt = c(1, 5, 6, 4, 6),
out = c(4, 5, 3, 2, 5))
dfc <- df %>%
group_by(group) %>%
mutate(balance=cumsum(start+receipt-out))
Source: local data frame [5 x 5]
Groups: group [1]
group start receipt out balance
<fctr> <dbl> <dbl> <dbl> <dbl>
1 A 5 1 4 2
2 A 0 5 5 2
3 A 0 6 3 5
4 A 0 4 2 7
5 A 0 6 5 8