Suppose I have the following dataset:
id1 <- c(1,1,1,1,2,2,2,2,1,1,1,1)
dates <- c("a","a","a","a","b","b","b","b","c","c","c","c")
x <- c(NA,0,NA,NA,NA,NA,0,NA,NA,NA,NA,0)
df <- data.frame(id1,dates,x)
My objective is to have a new column that explicitly tells counts the sequence of observations around 0 for every combination of id1 and dates. This would yield the following outcome:
desired_result <- c(-1,0,1,2,-2,-1,0,1,-3,-2,-1,0)
Any help is appreciated.
library(dplyr)
df %>%
group_by(id1, dates) %>%
mutate(x = row_number() - which(x == 0))
id1 dates x
1 1 a -1
2 1 a 0
3 1 a 1
4 1 a 2
5 2 b -2
6 2 b -1
7 2 b 0
8 2 b 1
9 1 c -3
10 1 c -2
11 1 c -1
12 1 c 0
With dplyr 1.1.0:
df %>%
mutate(x = row_number() - which(x == 0), .by = dates)
I have the following dataframe:
df <-read.table(header=TRUE, text="id code
1 A
1 B
1 C
2 A
2 A
2 A
3 A
3 B
3 A")
Per id, I would love to find those individuals that have at least 2 conditions, namely:
conditionA = "A"
conditionB = "B"
conditionC = "C"
and create a new colum with "index", 1 if there are two or more conditions met and 0 otherwise:
df_output <-read.table(header=TRUE, text="id code index
1 A 1
1 B 1
1 C 1
2 A 0
2 A 0
2 A 0
3 A 1
3 B 1
3 A 1")
So far I have tried the following:
df_output = df %>%
group_by(id) %>%
mutate(index = ifelse(grepl(conditionA|conditionB|conditionC, code), 1, 0))
and as you can see I am struggling to get the threshold count into the code.
You can create a vector of conditions, and then use %in% and sum to count the number of occurrences in each group. Use + (or ifelse) to convert logical into 1 and 0:
conditions = c("A", "B", "C")
df %>%
group_by(id) %>%
mutate(index = +(sum(unique(code) %in% conditions) >= 2))
id code index
1 1 A 1
2 1 B 1
3 1 C 1
4 2 A 0
5 2 A 0
6 2 A 0
7 3 A 1
8 3 B 1
9 3 A 1
You could use n_distinct(), which is a faster and more concise equivalent of length(unique(x)).
df %>%
group_by(id) %>%
mutate(index = +(n_distinct(code) >= 2)) %>%
ungroup()
# # A tibble: 9 × 3
# id code index
# <int> <chr> <int>
# 1 1 A 1
# 2 1 B 1
# 3 1 C 1
# 4 2 A 0
# 5 2 A 0
# 6 2 A 0
# 7 3 A 1
# 8 3 B 1
# 9 3 A 1
You can check conditions using intersect() function and check whether resulting list is of minimal (eg- 2) length.
conditions = c('A', 'B', 'C')
df_output2 =
df %>%
group_by(id) %>%
mutate(index = as.integer(length(intersect(code, conditions)) >= 2))
I'm trying to do some complex calculations and part of the code requires that I parse a comma separated entry and count the number of values that are more than 0.
Example input data:
a <- c(0,0,3,0)
b <- c(4,4,0,1)
c <- c("3,4,3", "2,1", 0, "5,8")
x <- data.frame(a, b, c)
x
a b c
1 0 4 3,4,3
2 0 4 2,1
3 3 0 0
4 0 1 5,8
The column that I need to parse, c is factors and all other columns are numeric. The number of values comma separated will vary, in this example it varies from 0 to 3.
The desired output would look like this:
x$c_occur <- c(3, 2, 0, 2)
x
a b c c_occur
1 0 4 3,4,3 3
2 0 4 2,1 2
3 3 0 0 0
4 0 1 5,8 2
Where c_occur lists the number of occurrences > 0 in the c column.
I was thinking something like this would work... but I can't figure it out.
library(dplyr
x_desired <- x %>%
mutate(c_occur = count(strsplit(c, ","), > 0))
We can make use of str_count
library(stringr)
library(dplyr)
x %>%
mutate(c_occur = str_count(c, '[1-9]\\d*'))
# a b c c_occur
#1 0 4 3,4,3 3
#2 0 4 2,1 2
#3 3 0 0 0
#4 0 1 5,8 2
After splitting the 'c', we can get the count by summing the logical vector after looping over the list output from strsplit
library(purrr)
x %>%
mutate(c_occur = map_int(strsplit(as.character(c), ","),
~ sum(as.integer(.x) > 0)))
# a b c c_occur
#1 0 4 3,4,3 3
#2 0 4 2,1 2
#3 3 0 0 0
#4 0 1 5,8 2
Or we can separate the rows with separate_rows and do a group_by summarise
library(tidyr)
x %>%
mutate(rn = row_number()) %>%
separate_rows(c, convert = TRUE) %>%
group_by(rn) %>%
summarise(c_occur = sum(c >0)) %>%
select(-rn) %>%
bind_cols(x, .)
# A tibble: 4 x 4
# a b c c_occur
# <dbl> <dbl> <fct> <int>
#1 0 4 3,4,3 3
#2 0 4 2,1 2
#3 3 0 0 0
#4 0 1 5,8 2
I have a longhitudinal dataframe with a lot of missing values that looks like this.
ID = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
date = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
cond = c(0,0,0,1,0,0,0,0,1,0,0,0,0,0,0)
var = c(1, NA , 2, 0,NA, NA, 3, NA,0, NA, 2, NA, 1,NA,NA)
df = data.frame(ID, date, cond,var)
I would like to carry forward the last observation based on two conditions:
1) when cond=0 it should carry on the observation the higher value of the variable of interest.
2) when cond=1 it should carry forward the lower value of the variable of interest.
Does anyone have an idea on how I could do this in an elegant way?
The final dataset should look like this
ID = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
date = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
cond = c(0,0,0,1,0,0,0,0,1,0,0,0,0,0,0)
var = c(1, 1 , 2, 0, 0, NA, 3, 3, 0, 0,2,2,2,2,2)
final = data.frame(ID, date, cond,var)
So far I was able to carry forward the last observation, but I was unable to impose the conditions
library(zoo)
df <- df %>%
group_by(ID) %>%
mutate(var =
na.locf(var, na.rm = F))
any suggestion is welcomed
This is the use of accumulate2 ie
df%>%
group_by(ID)%>%
mutate(d = unlist(accumulate2(var,cond[-1],function(z,x,y) if(y) min(z,x,na.rm=TRUE) else max(z,x,na.rm=TRUE))))
# A tibble: 15 x 5
# Groups: ID [3]
ID date cond var d
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 1 1
2 1 2 0 NA 1
3 1 3 0 2 2
4 1 4 1 0 0
5 1 5 0 NA 0
6 2 1 0 NA NA
7 2 2 0 3 3
8 2 3 0 NA 3
9 2 4 1 0 0
10 2 5 0 NA 0
11 3 1 0 2 2
12 3 2 0 NA 2
13 3 3 0 1 2
14 3 4 0 NA 2
15 3 5 0 NA 2
I think, if I understand what you are after is this?
ID = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
date = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
cond = c(0,0,0,1,0,0,0,0,1,0,0,0,0,0,0)
var = c(1, NA , 2, 0,NA, NA, 3, NA,0, NA, 2, NA, 1,NA,NA)
df = data.frame(ID, date, cond,var)
Using case_when you can do some conditional checks. I'm unsure if you mean to return the minimum for all of the "ID" field, but this will look at the condition and then lag or lead to find a non missing value
library(dplyr)
df %>%
mutate(var_imput = case_when(
cond == 0 & is.na(var)~lag(x = var, n = 1, default = NA),
cond == 1 & is.na(var)~lead(x = var, n = 1, default = NA),
TRUE~var
))
Which yields:
ID date cond var var_imput
1 1 1 0 1 1
2 1 2 0 NA 1
3 1 3 0 2 2
4 1 4 1 0 0
5 1 5 0 NA 0
6 2 1 0 NA NA
7 2 2 0 3 3
8 2 3 0 NA 3
9 2 4 1 0 0
10 2 5 0 NA 0
11 3 1 0 2 2
12 3 2 0 NA 2
13 3 3 0 1 1
14 3 4 0 NA 1
15 3 5 0 NA NA
If you want to group by ID then you could generate an impute table by ID, then join it with the original table like this:
# enerate input table
input_table <- df %>%
group_by(ID) %>%
summarise(min = min(var, na.rm = T),
max = max(var, na.rm = T)) %>%
gather(cond, value, -ID) %>%
mutate(cond = ifelse(cond == "min", 0, 1))
# Join and impute missing
df %>%
left_join(input_table,by = c("ID", "cond")) %>%
mutate(var_imput = ifelse(is.na(var), value, var))