Mark IDs that have a specific value across columns - r

I am trying to create a variable 'check' with values 1/0. These should be assigned based on whether across columns V1 to V3 there is at least one value = 1 for each ID.
DF <- data.frame (ID= c(1,1,1,2,3,3,4,5,5,6), V1= c(1,0,0,1,1,0,0,1,0,0),
V2= c(1,1,0,0,1,0,1,1,0,0), V3= c(0,1,0,1,0,0,0,0,1,0))
This is the code I am using but group by doesn't seem to work. It does seem to go across columns and mark as 1 all of those having at least one value of 1 but not by ID.
DF %>% dplyr::group_by(ID) %>%
dplyr::mutate(Check= case_when(if_any('V1':'V3',~.x!=0)~1,TRUE ~0)) %>%
dplyr::ungroup()
So the output I am looking for is this one:
ID
V1
V2
V3
check
1
1
1
0
1
1
0
1
1
1
1
0
0
0
1
2
1
0
1
0
3
1
1
0
0
3
0
0
0
0
4
0
1
0
0
5
1
1
0
1
5
0
0
1
1
6
0
0
0
0
Could you help?
Many thanks!
Edit: apologies, I have noticed a mistake in the output, it should be fine now.

Please check the below code
these are steps i followed
after grouping by ID column, derive new columns where if column is equal to 0 then change the value to NA, replace the 0 with NA
then retain the previous values to all the other rows so if the value is 1 it will be retained to other rows within the by group
then sum the values of all the three variables and if the sum of all 3 variable is 3 then derive check variable and update the value to 1
retain the 1 to other rows within the by group else set to zero
DF %>% group_by(ID) %>%
mutate(across(starts_with('V'), ~ ifelse(.x==0, NA, .x), .names = 'new_{col}')) %>%
fill(starts_with('new')) %>%
mutate(check=ifelse(rowSums(across(starts_with('new')))==3,1,0)) %>%
fill(check, .direction = 'downup') %>% mutate(check=replace_na(check,0)) %>%
select(-starts_with('new'))
Created on 2023-02-03 with reprex v2.0.2
# A tibble: 10 × 5
# Groups: ID [6]
ID V1 V2 V3 check
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 0 1
2 1 0 1 1 1
3 1 0 0 0 1
4 2 1 0 1 0
5 3 1 1 0 0
6 3 0 0 0 0
7 4 0 1 0 0
8 5 1 1 0 1
9 5 0 0 1 1
10 6 0 0 0 0

Related

Summarizing/counting multiple binary variables

For the purpose of this question, my data set includes 16 columns (c1_d, c2_d, ..., c16_d) and 364 rows (1-364). This is what it briefly looks like:
c1_d c2_d c3_d c4_d c5_d c6_d c7_d c8_d c9_d c10_d c11_d c12_d c13_d c14_d c15_d c16_d
1 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0
2 1 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0
3 1 1 0 1 1 1 1 1 0 1 1 0 1 0 1 0
4 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0
5 1 0 1 1 1 1 0 1 0 1 1 0 0 0 1 0
Please note that for example row 1, has five 1s and 11 0s.
This is what I'm trying to do: Basically counting how many rows have how many of the value 1 assigned to them (i.e. by the end of this analysis I want to get something like 20 rows had zero value 1 assigned to them, 33 rows had one value 1 assigned to them, 100 rows had 10 value 1 assigned to them, etc.).
I tried to create a data frame including all rows (364) and columns (16) I needed. I tried using the print.data.frame function, and its results are shown above, but it doesn't give me the number of 0s and 1s per row. I tried using functions such as table, ftable, and xtab, but they don't really work for more than three variables.
I would highly appreciate your help on this.
If I understand correctly:
library(dplyr)
library(tidyr)
df %>%
transmute(count0 = rowSums(df==0),
count1 = rowSums(df==1)) %>%
pivot_longer(everything()) %>%
count(name, value)
name value n
<chr> <dbl> <int>
1 count0 5 1
2 count0 6 1
3 count0 7 1
4 count0 11 1
5 count0 12 1
6 count1 4 1
7 count1 5 1
8 count1 9 1
9 count1 10 1
10 count1 11 1

Add a column that count number of rows until the first 1, by group in R

I have the following dataset:
test_df=data.frame(Group=c(1,1,1,1,2,2),var1=c(1,0,0,1,1,1),var2=c(0,0,1,1,0,0),var3=c(0,1,0,0,0,1))
Group
var1
var2
var3
1
1
0
0
1
0
0
1
1
0
1
0
1
1
1
0
2
1
0
0
2
1
0
1
I want to add 3 columns (out1-3) for var1-3, which count number of rows until the first 1, by Group,
as shown below:
Group
var1
var2
var3
out1
out2
out3
1
1
0
0
1
3
2
1
0
0
1
1
3
2
1
0
1
0
1
3
2
1
1
1
0
1
3
2
2
1
0
0
1
0
2
2
1
0
1
1
0
2
I used this R code, I repeated it for my 3 variables, and my actual dataset contains more than only 3 columns.
But it is not working:
test_var1<-select(test_df,Group,var1 )%>%
group_by(Group) %>%
mutate(out1 = row_number()) %>%
filter(var1 != 0) %>%
slice(1)
df <- data.frame(Group=c(1,1,1,1,2,2),
var1=c(1,0,0,1,1,1),
var2=c(0,0,1,1,0,0),
var3=c(0,1,0,0,0,1))
This works for any number of variables as long as the structure is the same as in the example (i.e. Group + many variables that are 0 or 1)
df %>%
mutate(rownr = row_number()) %>%
pivot_longer(-c(Group, rownr)) %>%
group_by(Group, name) %>%
mutate(out = cumsum(value != 1 & (cumsum(value) < 1)) + 1,
out = ifelse(max(out) > n(), 0, max(out))) %>%
pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>%
select(-rownr)
Returns:
Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 1 3 2
2 1 0 0 1 1 3 2
3 1 0 1 0 1 3 2
4 1 1 1 0 1 3 2
5 2 1 0 0 1 0 2
6 2 1 0 1 1 0 2
If you only have 3 "out" variables then you can create three rows as follows
#1- Your dataset
df=data.frame(Group=rep(1,4),var1=c(1,0,0,1),var2=c(0,0,1,1),var3=c(0,1,0,0))
#2- Count the first row number with "1" value
df$out1=min(rownames(df)[which(df$var1==1)])
df$out2=min(rownames(df)[which(df$var2==1)])
df$out3=min(rownames(df)[which(df$var3==1)])
If you have more than 3 columns, then it may be better to create a loop for example
for(i in 1:3){
df[paste("out",i,sep="")]=min(rownames(df)[which(df[,which(colnames(df)==paste("var",i,sep=""))]==1)])
}

Only Use The First Match For Every N Rows

I have a data.frame that looks like this.
Date Number
1 1
2 0
3 1
4 0
5 0
6 1
7 0
8 0
9 1
I would like to create a new column that puts a 1 in the column if it is the first 1 of every 3 rows. Otherwise put a 0. For example, this is how I would like the new data.frame to look
Date Number New
1 1 1
2 0 0
3 1 0
4 0 0
5 0 0
6 1 1
7 0 0
8 0 0
9 1 1
Every three rows we find the first 1 and populate the column otherwise we place a 0. Thank you.
Hmm, at first glance I thought Akrun answer provided me the solution. However, it is not exactly what I am looking for. Here is what #akrun solution provides.
df1 = data.frame(Number = c(1,0,1,0,1,1,1,0,1,0,0,0))
head(df1,9)
Number
1 1
2 0
3 1
4 0
5 1
6 1
7 1
8 0
9 1
Attempt at solution:
df1 %>%
group_by(grp = as.integer(gl(n(), 3, n()))) %>%
mutate(New = +(Number == row_number()))
Number grp New
<dbl> <int> <int>
1 1 1 1
2 0 1 0
3 1 1 0
4 0 2 0
5 1 2 0 #should be a 1
6 1 2 0
7 1 3 1
8 0 3 0
9 1 3 0
As you can see the code misses the one on row 5. I am looking for the first 1 in every chunk. Then everything else should be 0.
Sorry if i was unclear akrn
Edit** Akrun new answer is exactly what I am looking for. Thank you very much
Here is an option to create a grouping column with gl and then do a == with the row_number on the index of matched 1. Here, match will return only the index of the first match.
library(dplyr)
df1 %>%
group_by(grp = as.integer(gl(n(), 3, n()))) %>%
mutate(New = +(row_number() == match(1, Number, nomatch = 0)))
# A tibble: 12 x 3
# Groups: grp [4]
# Number grp New
# <dbl> <int> <int>
# 1 1 1 1
# 2 0 1 0
# 3 1 1 0
# 4 0 2 0
# 5 1 2 1
# 6 1 2 0
# 7 1 3 1
# 8 0 3 0
# 9 1 3 0
#10 0 4 0
#11 0 4 0
#12 0 4 0
Looking at the logic, perhaps you want to check if Number == 1 and that the prior 2 values were both 0. If that is not correct please let me know.
library(dplyr)
df %>%
mutate(New = ifelse(Number == 1 & lag(Number, n = 1L, default = 0) == 0 & lag(Number, n = 2L, default = 0) == 0, 1, 0))
Output
Date Number New
1 1 1 1
2 2 0 0
3 3 1 0
4 4 0 0
5 5 0 0
6 6 1 1
7 7 0 0
8 8 0 0
9 9 1 1
You can replace Number value to 0 except for the 1st occurrence of 1 in each 3 rows.
library(dplyr)
df %>%
group_by(gr = ceiling(row_number()/3)) %>%
mutate(New = replace(Number, -which.max(Number), 0)) %>%
#Or to be safe and specific use
#mutate(New = replace(Number, -which(Number == 1)[1], 0)) %>%
ungroup() %>% select(-gr)
# A tibble: 9 x 3
# Date Number New
# <int> <int> <int>
#1 1 1 1
#2 2 0 0
#3 3 1 0
#4 4 0 0
#5 5 0 0
#6 6 1 1
#7 7 0 0
#8 8 0 0
#9 9 1 1

Making a conditional variable based on last observation in temporal data

ID T V1
1 1 1
1 2 1
2 1 0
2 2 0
3 1 1
3 2 1
3 3 1
I need a to make two variables from these data. The first needs to be a 1 on the last observation only when V1 = 1, and then a 1 on the last observation for all cases. Ideal final product:
ID T V1 v2 v3
1 1 1 0 0
1 2 1 1 1
2 1 0 0 0
2 2 0 0 1
3 1 1 0 0
3 2 1 0 0
3 3 1 1 1
Thanks in advance.
in the package dplyr, you can group your data according a variable (according ID in your case) and make operations for each group. As one of your column (T) already counts the rank of each observation (within each group), you can combine with the function n() which returns the number of rows of each group in order to obtain what you want.
Suppose your data are in the dataframe df :
df %>%
group_by(ID) %>%
mutate(
v2 = 1 * (`T` == n()),
v3 = 1 * (`T` == n()) * (V1 == 1)
)
# A tibble: 7 x 5
# Groups: ID [3]
ID T V1 v2 v3
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 0 0
2 1 2 1 1 1
3 2 1 0 0 0
4 2 2 0 1 0
5 3 1 1 0 0
6 3 2 1 0 0
7 3 3 1 1 1

Converting a categorical variable to multiple binary variables [duplicate]

This question already has answers here:
Generate a dummy-variable
(17 answers)
How do I make a dummy variable in R?
(3 answers)
Closed 5 years ago.
I wish to convert part of my data to binary wide format.
This is my input:
mydf <- data.frame( transaction =c (1,0,1,1,1,0,0), quality = c("NEW", "OLD","OLD", "OLD","OLD","NEW","NEW"), brand = c(1,2,3,1,2,2,1))
transaction quality brand
1 1 NEW 1
2 0 OLD 2
3 1 OLD 3
4 1 OLD 1
5 1 OLD 2
6 0 NEW 2
7 0 NEW 1
>
and I wish to convert the brand column to wide format so that have the following output
transaction quality brand_1 brand_2 brand_3
1 1 NEW 1 0 0
2 0 OLD 0 1 0
3 1 OLD 0 0 1
4 1 OLD 1 0 0
5 1 OLD 0 1 0
6 0 NEW 0 1 0
7 0 NEW 1 0 0
I tried different approaches such as model.matrix function but couldn't reach to my desired output.
For every row we select it's corresponding column which needs to be changed to 1. We generate the row/column combination by using seq(for selecting rows) and paste0 (to select columns). For all those row/column combination we use mapply to change all the corresponding values to 1 using the not-so-famous global assignment operator.
#Generate new columns to be added
cols <- paste0("brand-", 1:3)
#Initialise the columns to 0
mydf[cols] <- 0
mapply(function(x, y) mydf[x, y] <<- 1, seq(nrow(mydf)),
paste0("brand-", mydf$brand))
mydf
# transaction quality brand brand-1 brand-2 brand-3
#1 1 NEW 1 1 0 0
#2 0 OLD 2 0 1 0
#3 1 OLD 3 0 0 1
#4 1 OLD 1 1 0 0
#5 1 OLD 2 0 1 0
#6 0 NEW 2 0 1 0
#7 0 NEW 1 1 0 0
We can remove the orginal brand column if we no longer require it using
mydf$brand <- NULL
For a tidy approach
library(dplyr)
library(tidyr)
library(tibble)
mydf %>%
rownames_to_column() %>%
group_by(rowname, transaction, quality, brand) %>%
summarise(count = n()) %>%
spread(brand, count, sep = "-", fill = 0) %>%
ungroup() %>%
select(-rowname)
# # A tibble: 7 x 5
# transaction quality `brand-1` `brand-2` `brand-3`
# * <dbl> <fctr> <dbl> <dbl> <dbl>
# 1 1 NEW 1 0 0
# 2 0 OLD 0 1 0
# 3 1 OLD 0 0 1
# 4 1 OLD 1 0 0
# 5 1 OLD 0 1 0
# 6 0 NEW 0 1 0
# 7 0 NEW 1 0 0

Resources