I'm trying to compute ICC values for each subject for the table below, but group_by() is not working as I think it should.
SubID Rate1 Rate2
1 1 2 5
2 1 2 4
3 1 2 5
4 2 3 4
5 2 4 1
6 2 5 1
7 2 2 2
8 3 2 5
9 3 3 5
The code I am running is as follows:
df %>%
group_by(SubID) %>%
summarise(icc = DescTools::ICC(.)$results[3, 2])
and the output:
# A tibble: 3 x 2
SubID icc
<dbl> <dbl>
1 1 -0.247
2 2 -0.247
3 3 -0.247
It seems that summarise is not being applied according to groups, but to the entire dataset. I'm not sure what is going on.
dput()
structure(list(SubID = c(1, 1, 1, 2, 2, 2, 2, 3, 3), Rate1 = c(2,
2, 2, 3, 4, 5, 2, 2, 3), Rate2 = c(5, 4, 5, 4, 1, 1, 2, 5, 5)), class = "data.frame", row.names = c(NA,
-9L))
Not terribly familiar with library(DescTools) but here is a potential solution that utilizes a nest() / map() combo:
library(DescTools)
library(tidyverse)
df <- structure(
list(SubID = c(1, 1, 1, 2, 2, 2, 2, 3, 3),
Rate1 = c(2, 2, 2, 3, 4, 5, 2, 2, 3),
Rate2 = c(5, 4, 5, 4, 1, 1, 2, 5, 5)),
class = "data.frame", row.names = c(NA, -9L)
)
df %>%
nest(ICC3 = -SubID) %>%
mutate(ICC3 = map_dbl(ICC3, ~ ICC(.x)[["results"]] %>%
filter(type == "ICC3") %>%
pull(est)))
#> # A tibble: 3 x 2
#> SubID ICC3
#> <dbl> <dbl>
#> 1 1 2.83e-15
#> 2 2 -5.45e- 1
#> 3 3 -6.66e-16
Created on 2021-03-08 by the reprex package (v0.3.0)
Related
I would like to reshape the data sample below, so that to get the output like in the table. How can I reach to that? the idea is to split the column e into two columns according to the disease. Those with disease 0 in one column and those with disease 1 in the other column. thanks in advance.
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), fid = c(1,
1, 2, 2, 3, 3, 4, 4, 5, 5), disease = c(0, 1, 0, 1, 1, 0, 1, 0, 0,
1), e = c(3, 2, 6, 1, 2, 5, 2, 3, 1, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
library(tidyverse)
df %>%
pivot_wider(fid, names_from = disease, values_from = e, names_prefix = 'e') %>%
select(-fid)
e0 e1
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1
if you want the e1,e2 you could do:
df %>%
pivot_wider(fid, names_from = disease, values_from = e,
names_glue = 'e{disease + 1}') %>%
select(-fid)
# A tibble: 5 x 2
e1 e2
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1
We could use lead() combined with ìfelse statements for this:
library(dplyr)
df %>%
mutate(e2 = lead(e)) %>%
filter(row_number() %% 2 == 1) %>%
mutate(e1 = ifelse(disease==1, e2,e),
e2 = ifelse(disease==0, e2,e)) %>%
select(e1, e2)
e1 e2
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1
I have a dataframe that looks like this, but there will be many more IDs:
# Groups: ID [1]
ID ARS stim
<int> <int> <chr>
1 3 0 1
2 3 4 2
3 3 2 3
4 3 3 4
5 3 1 5
6 3 0 6
7 3 2 10
8 3 4 11
9 3 0 12
10 3 3 13
11 3 2 14
12 3 2 15
I would like to calculate the sum of the absolute difference abs() between the values in ARS, e.g. for stim=1 and stim=10 plus for stim=2 and stim=11 and so on.
Any good solutions are appreciated!
The desired output calculation is:
abs(0-2) + abs(4-4) + abs(2-0) + abs(3-3) + abs(1-2) + abs(0-2)
Hence, 2+0+2+0+1+2
Output for ID==3: 7
A possible solution:
library(dplyr)
df <- structure(list(ID = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), ARS = c(0, 4, 2, 3, 1, 0, 2, 4, 0, 3, 2, 2), stim = c(1, 2, 3, 4, 5, 6,
10, 11, 12, 13, 14, 15)), row.names = c(NA, -12L), class = "data.frame")
df %>%
group_by(ID) %>%
summarise(value = abs(ARS[which(stim == 1:6)] - ARS[which(stim == 9+1:6)]),
.groups = "drop") %>%
pull(value) %>% sum
#> [1] 7
I have a dataset that has some ID and associated timepoints. I want to filter out IDs that have a specific combination of timepoints. If I filter using %in% or |, I get IDs out of the specific combination. How do I do this in R ?
ID
Timepoint
1
1
1
6
1
12
2
1
3
1
3
6
3
12
3
18
4
1
4
6
4
12
I want to filter IDs that have timepoints 1,6 and 12 and exclude other IDs.
Result would be IDs 1,3 and 4
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4),
Timepoint = c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12))
df %>%
filter(Timepoint %in% c(1, 6, 12)) %>%
mutate(indicator = 1) %>%
group_by(ID) %>%
complete(Timepoint = c(1, 6, 12)) %>%
filter(!ID %in% pull(filter(., is.na(indicator)), ID)) %>%
select(indicator)
Output:
# A tibble: 9 × 2
# Groups: ID [3]
ID indicator
<dbl> <dbl>
1 1 1
2 1 1
3 1 1
4 3 1
5 3 1
6 3 1
7 4 1
8 4 1
9 4 1
We can use
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(c(1, 6, 12) %in% Timepoint)) %>%
ungroup
-output
# A tibble: 10 x 2
ID Timepoint
<dbl> <dbl>
1 1 1
2 1 6
3 1 12
4 3 1
5 3 6
6 3 12
7 3 18
8 4 1
9 4 6
10 4 12
From your data, ID 2 has time point 1. So if filter by time points 1, 6, 12, the result will be 1, 2, 3, 4 instead of 1, 3, 4.
ids <- c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4)
time_points <- c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12)
dat <- data.frame(ids, time_points)
unique(dat$ids[dat$time_points %in% c(1, 6, 12)])
I have a dataframe with Markets, Retailers and Sales. I need to bin the Retailers within each Market into 5 quantiles.
Example:
dataframe <- structure(list(Market = c(1, 1, 1, 2, 2, 2), Retailer = c(1,
2, 3, 4, 5, 6), Sales = c(5, 10, 25, 5, 10, 25), Quantile = c(1,
2, 3, 1, 2, 3)), class = "data.frame", row.names = c(NA, -6L))
One approach is using group_by and ntile from dplyr:
library(dplyr)
dataframe %>%
group_by(Market) %>%
mutate(Quantile = ntile(Sales, 4))
# A tibble: 150 x 4
# Groups: Market [3]
Market Retailer Sales Quantile
<int> <int> <dbl> <int>
1 1 1 16804 1
2 1 2 80752 4
3 1 3 38494 2
4 1 4 32773 2
5 1 5 60210 3
# … with 145 more rows
Data
set.seed(3)
dataframe <- data.frame(Market = rep(1:3, each = 50),
Retailer = rep(1:50, times = 3),
Sales = round(runif(150,0,100000),0))
I have following data:
df <- data.frame(
x = c(1, 4, 3, 4, 4, 3),
y = c(2, 3, 4, 4, 2, 3)
)
I try use this code:
library(tidyverse)
df %>%
filter_if(~ is.numeric(.), all_vars(. %in% c('3', '4')))
x y
1 4 3
2 3 4
3 4 4
4 3 3
But, the expected result is:
x y
1 3 3
2 4 4
How do this?
A different approach:
require(tidyverse)
df <- data.frame(
x = c(1, 4, 3, 4, 4, 3),
y = c(2, 3, 4, 4, 2, 3),
z = letters[1:6]
)
df %>%
filter(apply(.,1,function(x) length(unique(x[grepl('[0-9]',x)]))==1))
gives:
x y z
1 4 4 d
2 3 3 f
I have added a non-numeric column to the example data, to illustrate this solution.
Not a filter_if() possibility, but essentially following that logic:
df %>%
filter(rowMeans(select_if(., is.numeric) == pmax(!!!select_if(., is.numeric))) == 1)
x y z
1 4 4 d
2 3 3 f
Sample data:
df <- data.frame(x = c(1, 4, 3, 4, 4, 3),
y = c(2, 3, 4, 4, 2, 3),
z = letters[1:6],
stringsAsFactors = FALSE)