If else Condition in R based on different columns and rows - r

I have a dataset with an ID column with multiple visits for every ID. I am trying to create a new variable Status, which will check the Visit column and Value column. The conditions are as follows
For visit in 1,2 & 3, if the values are 1,1,1 then 1
For visit in 1,2 & 3, if the values are 0,1,1 then 0
For visit in 1,2 & 3, if the values are 0,0,0 then 0
How do I specify this condition in R ?
Below is a sample dataset
ID
Visit
Value
1
1
1
1
2
1
1
3
1
2
1
1
2
2
0
2
3
0
3
1
0
3
2
0
3
3
0
4
1
0
4
2
1
4
3
1
Result dataset
ID
Visit
Value
Status
1
1
1
1
1
2
1
1
1
3
1
1
2
1
1
0
2
2
0
0
2
3
0
0
3
1
0
0
3
2
0
0
3
3
0
0
4
1
0
0
4
2
1
0
4
3
1
0

I'd have tried something like this (suppose your initial table is called df):
status = c()
for(i in 1:4){ #1:4 correspond to the ID you showed us
if(sum(df[df$ID == i,'value'])==3) status=c(status,rep(1,3))
if(sum(df[df$ID == i,'value'])!=3) status=c(status,rep(0,3))
}
df = cbind(df,status)
I hope that it will help you

I believe that case_when from the dplyr package is what you need to use. Here more details on that fuction: https://dplyr.tidyverse.org/reference/case_when.html

Related

Mutate multiply columns based on conditional and column name

I have a dataframe with the following structure (See example). The dots after OperatedIn2007 column signify multiple columns with same name, changing only the year (e.g OperatedIn2008, OperatedIn2009, etc.).
I wish to do the following procedure:
If the group is 1, then add one in all columns whose names start with OperatedIn.
The expected result should be similar to the one presented in the desired output.
A nonscalable solution would be to use:
df <- df %<%
mutate(OperatedIn2006 = ifelse(group == 1, 1, 0)) %<%
[...]
I imagine there is some slick solution using dplyr or data.table, but I could not think of it myself.
Example
ID group OperatedIn2006 OperatedIn2007 ...
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5 1 0 0
6 2 0 0
Desired output
ID group OperatedIn2006 OperatedIn2007 ...
1 1 1 1
2 2 0 0
3 3 0 0
4 4 0 0
5 1 1 1
6 2 0 0
We could use across with an ifelse statement:
library(dplyr)
df %>%
mutate(across(-c(ID, group), ~ifelse(group==1, 1, .)))
ID group OperatedIn2006 OperatedIn2007
1 1 1 1 1
2 2 2 0 0
3 3 3 0 0
4 4 4 0 0
5 5 1 1 1
6 6 2 0 0

Conditionally delete individuals from longtidunal data [duplicate]

This question already has answers here:
Select groups which have at least one of a certain value
(3 answers)
Closed 1 year ago.
I have a longitudinal data set where I want to drop individuals (id) if they do no fulfill the criterion indicated by criteria == 1 at any time points. To put it in context we could say that criteria denotes if the individual was living in the region of interest at any time during.
Using some toy-data that have a similar structure as mine:
id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
event <- c(0,1,0,1,0,0,0,0,0,0,1,0,1,0,1)
criteria <- c(1,0,0,0,0,0, 0, 0, 0, 1, 1, 1,0,0,1)
df <- data.frame(cbind(id,time,event, criteria))
> df
id time event criteria
1 1 1 0 1
2 1 2 1 0
3 1 3 0 0
4 2 1 1 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 0
8 3 2 0 0
9 3 3 0 0
10 4 1 0 1
11 4 2 1 1
12 4 3 0 1
13 5 1 1 0
14 5 2 0 0
15 5 3 1 1
So by removing any id that have criteria == 0 at all time points (time) would lead to an end result looking like this:
id time event criteria
1 1 1 0 1
2 1 2 1 0
3 1 3 0 0
4 4 1 0 1
5 4 2 1 1
6 4 3 0 1
7 5 1 1 0
8 5 2 0 0
9 5 3 1 1
I've been trying to achieve this by using dplyr::group_by(id) and then filter on the criterion but that does not achieve the result I want to. I'd prefer a tidyverse solution! :D
Thanks!
df %>%
group_by(id) %>%
# looking for the opposite (i.e. !) of criteria == 1 at least 1 time
mutate(is_good = !any(criteria == 1)) %>%
filter(is_good)
If you'd be willing to look into data.table's, which I recommend, it would be as simple as this:
library(data.table)
setDT(df) # make it a data.table
df[ , .SD[ !all(criteria==0) ], by=id ]
See this page for a general introduction and an explanation of the .SD idiom:
https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

How to flag duplicate values in r - newbie

I'm trying to flag duplicate IDs in another column. I don't necessarily want to remove them yet, just create an indicator (0/1) of whether the IDs are unique or duplicates. In sql, it would be like this:
SELECT ID, count(ID) count from TABLE group by ID) a
On TABLE.ID = a.ID
set ID Duplicate Flag Column 1 = 1
where count > 1;
Is there a way to do this simply in r?
Any help would be greatly appreciated.
As an example of duplicated let's start with some values (numbers here, but strings would do the same thing)
x <- c(9, 1:5, 3:7, 0:8)
x
# 9 1 2 3 4 5 3 4 5 6 7 0 1 2 3 4 5 6 7 8
If you want to flag the second and later copies
as.numeric(duplicated(x))
# 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 0
If you want to flag all values that occur two or more times
as.numeric(x %in% x[duplicated(x)])
# 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0

Copy and multiply values between data frames according to the group

I have a dataframe DF1. id denotes participant's number, and then we have few observations (rows) for each participant:
id blocktype condition blocknr markodd
1 1 1 1 0
1 3 2 2 0
1 3 3 2 0
2 1 2 1 0
2 1 1 2 0
2 1 1 2 0
3 4 1 1 0
3 1 1 2 0
3 2 1 2 0
I also have another data frame DF2, with additional data, this time with single line for each person:
id taskorder exporder
1 1 1
2 2 1
3 1 2
I would like to take a value from DF2 for each id, and copy and multiply it across all observations for the respective id, all in a new column of DF1, so that I get this:
id blocktype condition blocknr markodd taskorder
1 1 1 1 0 1
1 3 2 2 0 1
1 3 3 2 0 1
2 1 2 1 0 2
2 1 1 2 0 2
2 1 1 2 0 2
3 4 1 1 0 1
3 1 1 2 0 1
3 2 1 2 0 1
Can you please tip me how to do it? dplyr solution would be most preferable!
Try this :
DF1 <- DF1 %>% left_join(DF2, by="id") %>% dplyr::select(colnames(DF1), taskorder)

How can i count occurrence with few variables in R

I have some example data.frame:
x<- data.frame(c(0,1,2,1,2,1,2),c(0,1,2,1,2,2,1),c(0,1,2,1,2,1,2),c(0,1,2,1,2,2,1))
colnames(x) <- c('PV','LA','Wiz','LAg')
I want to count occurrence by hole row. The result should look like:
PV LA Wiz Lag Replace
0 0 0 0 1
1 1 1 1 2
2 2 2 2 2
1 2 1 2 1
2 1 2 1 1
The row 0 0 0 0 was replaced 1, row 1 1 1 1 was replaced 2 times etc.
Do you have any idea, how can I do it ?
Maybe you want this?
as.data.frame(table(do.call(paste, x[,-1])))
# Var1 Freq
#1 0 0 0 0 1
#2 1 1 1 1 2
#3 1 2 1 2 1
#4 2 1 2 1 1
#5 2 2 2 2 2

Resources