Create variable with conditions on other multiple variables

Create variable with conditions on other multiple variables - r

I'm trying to create variable with conditions on other multiple variables.
For example, I have 5 variables, A, B, C, D, E. They ranges from 1 to 8.
I want to create new variable, grade, with conditions below.
1) If any of variables (A to E) are under 2, the grade will be 1
2) if all of variables are more than 3 and any of variables are between 3, 4, the grade will be 2.
3) if all of variables are more than 5, the grade will be 3.
I create dataset test arbitrarily.
test<-data.frame(A=c(4,7,4,1,4),
B=c(8,8,6,5,8),
C=c(6,5,6,7,5),
D=c(7,8,7,5,8),
E=c(5,7,8,5,5))
test
In this case, the grade will be 2,3,2,1,2.
I tried mutate_at function with vars and one_of function. However, it didn't return what I expected.
test<-test%>%mutate_at(
vars(one_of("A","B","C","D","E")),
funs(grade=case_when(. %in% c(1,2)~1,
min(.) %in% c(3,4)~2,
min(.) %in% c(5,6,7,8)~3)))
test
A B C D E A_grade B_grade C_grade D_grade E_grade
1 4 8 6 7 5 NA 3 3 3 3
2 7 8 5 8 7 NA 3 3 3 3
3 4 6 6 7 8 NA 3 3 3 3
4 1 5 7 5 5 1 3 3 3 3
5 4 8 5 8 5 NA 3 3 3 3
I would appreciate for all your help.

You can use the new version of dplyr, installed via remotes::install_github("tidyverse/dplyr") and the new c_across to get what you want easily. Note that the result doesn't have a 3 because I interpreted your logic as > 5 rather than >= 5.
library(dplyr)
test<-data.frame(A=c(4,7,4,1,4),
B=c(8,8,6,5,8),
C=c(6,5,6,7,5),
D=c(7,8,7,5,8),
E=c(5,7,8,5,5))
test %>%
rowwise() %>%
mutate(grade = case_when(
sum(c_across(A:E) < 2) > 0 ~ 1,
sum(c_across(A:E) > 5) == 5 ~ 3,
TRUE ~ 2
))
#> # A tibble: 5 x 6
#> # Rowwise:
#> A B C D E grade
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 8 6 7 5 2
#> 2 7 8 5 8 7 2
#> 3 4 6 6 7 8 2
#> 4 1 5 7 5 5 1
#> 5 4 8 5 8 5 2

Related

Creating a new factor variable from multiple factor variables, all with same levels

Imagine a data frame having multiple factor columns with the same levels, but different entries(maybe coming from a survey).
f1=factor(sample(1:4,10,replace=T))
f2=factor(sample(1:4,10,replace=T))
f3=factor(sample(1:4,10,replace=T))
df=data.frame(id=1:10,f1,f2,f3)
I want to create a new factor variable that takes on a value of 1 if at least two of the three previously defined factors are in levels 1 or 2, a value of 2 if at least two of f1,f2,f3 are in level 3, and a value of 3 if at least two of f1,f2,f3 are in level 4, a value of 4 otherwise(if this case exists?).
I understand it is possible to do so with very deep nesting of if else statements alongside a great arrangement of logical operators. But I was wondering if there is a somewhat more elegant solution using maybe dplyr functions?

In dplyr you can specify the conditions in case_when :
library(dplyr)
df %>%
rowwise() %>%
mutate(result = {
vec <- c_across(f1:f3)
case_when(sum(vec %in% 1:2) >= 2 ~ 1,
sum(vec == 3) >= 2 ~ 2,
sum(vec == 4) >= 2 ~ 3,
TRUE ~ 4)
})
# id f1 f2 f3 result
# <int> <fct> <fct> <fct> <dbl>
# 1 1 4 2 1 1
# 2 2 1 1 1 1
# 3 3 4 2 2 1
# 4 4 4 3 1 4
# 5 5 2 2 1 1
# 6 6 3 4 2 4
# 7 7 4 2 4 3
# 8 8 3 2 2 1
# 9 9 3 1 1 1
#10 10 2 1 1 1

Check to see if this works for you:
f1=factor(sample(1:4,10,replace=T))
f2=factor(sample(1:4,10,replace=T))
f3=factor(sample(1:4,10,replace=T))
df=data.frame(id=1:10,f1,f2,f3)
df$f4 <- factor(apply(df[-1], 1, function(x) {
y <- which(table(factor(replace(as.numeric(x), x == "2", 1), c(1:2, 4))) > 1)
if(length(y) == 0) 4 else y
}))
df
#> id f1 f2 f3 f4
#> 1 1 1 2 4 1
#> 2 2 1 3 2 1
#> 3 3 4 4 2 3
#> 4 4 1 4 2 1
#> 5 5 1 1 1 1
#> 6 6 1 3 3 4
#> 7 7 3 1 3 4
#> 8 8 1 3 4 4
#> 9 9 4 2 1 1
#> 10 10 2 3 3 4
Created on 2020-12-08 by the reprex package (v0.3.0)

How to mutate multiple columns with dynamic variable using purrr:map function?

I have a data frame as below:
df <- data.frame(
id = c(1:5),
a = c(3,10,4,0,15),
b = c(2,1,1,0,3),
c = c(12,3,0,3,1),
d = c(9,7,8,0,0),
e = c(1,2,0,2,2)
)
I need to add multiple columns of which names are given by a combination of a:c and 3:5. 3:5 is also used insum function:
df %>% mutate(
usa_3 = sum(1+3),
usa_4 = sum(1+4),
usa_5 = sum(1+5),
canada_3 = sum(1+3),
canada_4 = sum(1+4),
canada_5 = sum(1+5),
nz_3 = sum(1+3),
nz_4 = sum(1+4),
nz_5 = sum(1+5)
)
The result is really simple but I do not want to put similar codes repeatedly.
id a b c d e usa_3 usa_4 usa_5 canada_3 canada_4 canada_5 nz_3 nz_4 nz_5
1 1 3 2 12 9 1 4 5 6 4 5 6 4 5 6
2 2 10 1 3 7 2 4 5 6 4 5 6 4 5 6
3 3 4 1 0 8 0 4 5 6 4 5 6 4 5 6
4 4 0 0 3 0 2 4 5 6 4 5 6 4 5 6
5 5 15 3 1 0 2 4 5 6 4 5 6 4 5 6
The variables are alphabetical prefix and range of integers as postfix.
Postfix is also related to the sum funcion as 1+postfix.
In this case, they have 3 values for each so the result have 9 additional columns.
I do not prefer to define function outside the a bunch of codes and suppose map functino in purrr may help it.
Do you know how to make it work?
Especially it is difficult to give dynamic column name in pipe.
I found some similar questions but it does not match my need.
Multivariate mutate
How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs
===== ADDITIONAL INFO =====
Let me clarify some conditions of this issue.
Actually sum(1+3), sum(1+4)... part is replaced by as.factor(cutree(X,k=X)) where X is reuslt of cluster analysis and Y is a variable defined as 3:5 in the example. cutree() is a function to define in which part we cut a dendrogram stored in the result of cluster analysis.
As for the column names usa_3, usa_4 ... nz_5, country name is replaced by methods of cluster analysis such as ward, McQuitty, Median method, etc. (seven methods), and integers 3, 4, 5, are the parameter to define in which part I need to cut a dendrogram as explained.
As for an X in the functionas.factor(cutree(X,k=X)), results of cluster analysis also have several data frame which is corresponded to each method. I realized that another issue how to apply the function to each data frame (result of cluster analysis stored in different dataframe).
Actual scripts that I am using currently is something like this:
cluste_number <- original_df %>% mutate(
## Ward
ward_3=as.factor(cutree(clst.ward,k=3)),
ward_4=as.factor(cutree(clst.ward,k=4)),
ward_5=as.factor(cutree(clst.ward,k=5)),
ward_6=as.factor(cutree(clst.ward,k=6)),
## Single
sing_3=as.factor(cutree(clst.sing,k=3)),
sing_4=as.factor(cutree(clst.sing,k=4)),
sing_5=as.factor(cutree(clst.sing,k=5)),
sing_6=as.factor(cutree(clst.sing,k=6)))
It is sorry not to clarify the actual issue; howerver, due to this reason above, number of countries as usa, canada, nz and number of parameters as 1:3 do not match.
Also some suggestions using i + . does not meet the issue as a function as.factor(cutree(X,k=X)) is used in the actual operation.
Thank you for your support.

Not sure what you are up to, but maybe this helps to clarify the issue ..
library(tidyverse)
df <- data.frame(
id = c(1:5),
a = c(3,10,4,0,15),
b = c(2,1,1,0,3),
c = c(12,3,0,3,1),
d = c(9,7,8,0,0),
e = c(1,2,0,2,2)
)
ctry <- rep(c("usa", "ca", "nz"), each = 3)
nr <- rep(seq(3,5), times = 3)
df %>%
as_tibble() %>%
bind_cols(map_dfc(seq_along(ctry), ~1+nr[.x] %>%
rep(nrow(df))) %>%
set_names(str_c(ctry, nr, sep = "_")))
# A tibble: 5 x 15
id a b c d e usa_3 usa_4 usa_5 ca_3 ca_4 ca_5 nz_3 nz_4 nz_5
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 3 2 12 9 1 4 5 6 4 5 6 4 5 6
2 2 10 1 3 7 2 4 5 6 4 5 6 4 5 6
3 3 4 1 0 8 0 4 5 6 4 5 6 4 5 6
4 4 0 0 3 0 2 4 5 6 4 5 6 4 5 6
5 5 15 3 1 0 2 4 5 6 4 5 6 4 5 6

I'm not sure if I understand the spirit of the problem, but here is one way to generate a data frame with the column names and values you want.
You can change ~ function(i) i + . to be whatever function of i (the column being mutated) you want, and change either of the ns in setNames(n, n) to incorporate a different value into the function you're creating (first n) or change the names of the resulting columns (second n).
countries <- c('usa', 'canada', 'nz')
n <- 3:5
as.data.frame(matrix(1, nrow(df), length(n))) %>%
rename_all(~countries) %>%
mutate_all(map(setNames(n, n), ~ function(i) i + .)) %>%
select(-countries) %>%
bind_cols(df)
# usa_3 canada_3 nz_3 usa_4 canada_4 nz_4 usa_5 canada_5 nz_5 id a b c d e
# 1 4 4 4 5 5 5 6 6 6 1 3 2 12 9 1
# 2 4 4 4 5 5 5 6 6 6 2 10 1 3 7 2
# 3 4 4 4 5 5 5 6 6 6 3 4 1 0 8 0
# 4 4 4 4 5 5 5 6 6 6 4 0 0 3 0 2
# 5 4 4 4 5 5 5 6 6 6 5 15 3 1 0 2

Kinda of a dirty solution, but it does what you want. It combines two map_dfc functions.
library(dplyr)
library(purrr)
df <- tibble(id = c(1:5),
a = c(3,10,4,0,15),
b = c(2,1,1,0,3),
c = c(12,3,0,3,1),
d = c(9,7,8,0,0),
e = c(1,2,0,2,2))
create_postfix_cols <- function(df, country, n) {
# df = a dataframe
# country = suffix value (e.g. "canada")
# n = vector of postfix values (e.g. 3:5)
map2_dfc(.x = rep(country, length(n)),
.y = n,
~ tibble(col = rep(1 + .y, nrow(df))) %>%
set_names(paste(.x, .y, sep = "_")))
}
countries <- c("usa", "canada", "nz")
n <- 3:5
df %>%
bind_cols(map_dfc(.x = countries, ~create_postfix_cols(df, .x, n)))
# A tibble: 5 x 15
id a b c d e usa_3 usa_4 usa_5 canada_3 canada_4 canada_5
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 3 2 12 9 1 4 5 6 4 5 6
2 2 10 1 3 7 2 4 5 6 4 5 6
3 3 4 1 0 8 0 4 5 6 4 5 6
4 4 0 0 3 0 2 4 5 6 4 5 6
5 5 15 3 1 0 2 4 5 6 4 5 6
# ... with 3 more variables: nz_3 <dbl>, nz_4 <dbl>, nz_5 <dbl>

Here is a base R solution. You can rearrange columns if you would like, but this should get your started:
# Create column names using an index and country names
idx <- 3:5
countries <- c("usa", "canada", "nz")
new_columns <- unlist(lapply(countries, paste0, "_", idx))
# Adding new values using index & taking advantage of recycling
df[new_columns] <- sort(rep(1+idx, nrow(df)))
df
id a b c d e usa_3 usa_4 usa_5 canada_3 canada_4 canada_5 nz_3 nz_4 nz_5
1 1 3 2 12 9 1 4 5 6 4 5 6 4 5 6
2 2 10 1 3 7 2 4 5 6 4 5 6 4 5 6
3 3 4 1 0 8 0 4 5 6 4 5 6 4 5 6
4 4 0 0 3 0 2 4 5 6 4 5 6 4 5 6
5 5 15 3 1 0 2 4 5 6 4 5 6 4 5 6
Or, if you prefer:
# All in one long line
df[unlist(lapply(countries, paste0, "_", idx))] <- sort(rep(1+idx, nrow(df)))

Create new column based on condition from other column per group using tidy evaluation

Similar to this question but I want to use tidy evaluation instead.
df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
speed = c(3,4,3,4,5,6,6,4,9))
> df
group date speed
1 1 1 3
2 1 2 4
3 1 3 3
4 2 4 4
5 2 5 5
6 2 6 6
7 3 7 6
8 3 8 4
9 3 9 9
The task is to create a new column (newValue) whose values equals to the values of the date column (per group) with one condition: speed == 4. Example: group 1 has a newValue of 2 because date[speed==4] = 2.
group date speed newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
It worked without tidy evaluation
df %>%
group_by(group) %>%
mutate(newValue=date[speed==4L])
#> # A tibble: 9 x 4
#> # Groups: group [3]
#> group date speed newValue
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 3 2
#> 2 1 2 4 2
#> 3 1 3 3 2
#> 4 2 4 4 4
#> 5 2 5 5 4
#> 6 2 6 6 4
#> 7 3 7 6 8
#> 8 3 8 4 8
#> 9 3 9 9 8
But had error with tidy evaluation
my_fu <- function(df, filter_var){
filter_var <- sym(filter_var)
df <- df %>%
group_by(group) %>%
mutate(newValue=!!filter_var[speed==4L])
}
my_fu(df, "date")
#> Error in quos(..., .named = TRUE): object 'speed' not found
Thanks in advance.

We can place the evaluation within brackets. Otherwise, it may try to evaluate the whole expression (filter_var[speed = 4L]) instead of filter_var alone
library(rlang)
library(dplyr)
my_fu <- function(df, filter_var){
filter_var <- sym(filter_var)
df %>%
group_by(group) %>%
mutate(newValue=(!!filter_var)[speed==4L])
}
my_fu(df, "date")
# A tibble: 9 x 4
# Groups: group [3]
# group date speed newValue
# <dbl> <dbl> <dbl> <dbl>
#1 1 1 3 2
#2 1 2 4 2
#3 1 3 3 2
#4 2 4 4 4
#5 2 5 5 4
#6 2 6 6 4
#7 3 7 6 8
#8 3 8 4 8
#9 3 9 9 8

Also, you can use from sqldf. Join df with a constraint on that:
library(sqldf)
df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
speed = c(3,4,3,4,5,6,6,4,9))
sqldf("SELECT df_origin.*, df4.`date` new_value FROM
df df_origin join (SELECT `group`, `date` FROM df WHERE speed = 4) df4
on (df_origin.`group` = df4.`group`)")

convert lists of vectors in just one tibble data frame

I have two lists. Each of them with many vectors (around 500) of different lengths and I would like to get a tibble data frame with three columns.
My reproducible example is the following:
> a
[[1]]
[1] 1 3 6
[[2]]
[1] 5 4
> b
[[1]]
[1] 3 4
[[2]]
[1] 5 6 7
I would like to get the following tibble data frame:
name index value
a 1 1
a 1 3
a 1 6
a 2 5
a 2 4
b 1 3
b 1 4
b 2 5
b 2 6
b 2 7
I would be grateful if someone could help me with this issue

using Base R:
transform(stack(c(a=a,b=b)),name=substr(ind,1,1),ind=substr(ind,2,2))
values ind name
1 1 1 a
2 2 1 a
3 3 1 a
4 5 2 a
5 6 2 a
6 3 1 b
7 4 1 b
8 5 2 b
9 6 2 b
10 7 2 b
using tidyverse:
library(tidyverse)
list(a=a,b=b)%>%map(~stack(setNames(.x,1:length(.x))))%>%bind_rows(.id = "name")
name values ind
1 a 1 1
2 a 2 1
3 a 3 1
4 a 5 2
5 a 6 2
6 b 3 1
7 b 4 1
8 b 5 2
9 b 6 2
10 b 7 2

Here is one option with tidyverse
library(tidyverse)
list(a= a, b = b) %>%
map_df(enframe, name = "index", .id = 'name') %>%
unnest
# A tibble: 10 x 3
# name index value
# <chr> <int> <dbl>
# 1 a 1 1
# 2 a 1 3
# 3 a 1 6
# 4 a 2 5
# 5 a 2 4
# 6 b 1 3
# 7 b 1 4
# 8 b 2 5
# 9 b 2 6
#10 b 2 7
data
a <- list(c(1, 3, 6), c(5, 4))
b <- list(c(3, 4), c(5, 6, 7))

Count distinct values that are not the same as the current row's values

Suppose I have a data frame:
df <- data.frame(SID=sample(1:4,15,replace=T), Var1=c(rep("A",5),rep("B",5),rep("C",5)), Var2=sample(2:4,15,replace=T))
which comes out to something like this:
SID Var1 Var2
1 4 A 2
2 3 A 2
3 4 A 3
4 3 A 3
5 1 A 4
6 1 B 2
7 3 B 2
8 4 B 4
9 4 B 4
10 3 B 2
11 2 C 2
12 2 C 2
13 4 C 4
14 2 C 4
15 3 C 3
What I hope to accomplish is to find the count of unique SIDs (see below under update, this should have said count of unique (SID, Var1) combinations) where the given row's Var1 is excluded from this count and the count is grouped on Var2. So for the example above, I would like to output:
SID Var1 Var2 Count.Excluding.Var1
1 4 A 2 3
2 3 A 2 3
3 4 A 3 1
4 3 A 3 1
5 1 A 4 3
6 1 B 2 3
7 3 B 2 3
8 4 B 4 3
9 4 B 4 3
10 3 B 2 3
11 2 C 2 4
12 2 C 2 4
13 4 C 4 2
14 2 C 4 2
15 3 C 3 2
For the 1st observation, we have a count of 3 because there are 3 unique combinations of (SID, Var1) for the given Var2 value (2, in this case) where Var1 != A (Var1 value of 1st observation) -- specifically, the count includes observation 6, 7 and 11, but not 12 because we already accounted for a (SID, Var1)=(2,C) and not row 2 because we do not want Var1 to be "A". All of these rows have the same Var2 value.
I'd preferably like to use dplyr functions and the %>% operator.
&
UPDATE
I apologize for the confusion and my incorrect explanation above. I have corrected what I intended on asking for in the paranthesis, but I am leaving my original phrasing as well because majority of answers seem to interpret it this way.
As for the example, I apologize for not setting the seed. There seems to have been some confusion with regards to the Count.Excluding.Var1 for rows 11 and 12. With unique (SID, Var1) combinations, rows 11 and 12 should make sense as these count rows 1,2,6, and 7 xor 8.

A simple mapply can do the trick. But as OP requested for %>% based solution, an option could be as:
df %>% mutate(Count.Excluding.Var1 =
mapply(function(x,y)nrow(unique(df[df$Var1 != x & df$Var2 == y,1:2])),.$Var1,.$Var2))
# SID Var1 Var2 Count.Excluding.Var1
# 1 4 A 2 3
# 2 2 A 3 3
# 3 4 A 4 3
# 4 4 A 4 3
# 5 3 A 4 3
# 6 4 B 3 1
# 7 3 B 3 1
# 8 3 B 3 1
# 9 4 B 2 3
# 10 2 B 3 1
# 11 2 C 2 2
# 12 4 C 4 2
# 13 1 C 4 2
# 14 1 C 2 2
# 15 3 C 4 2
Data:
The above results are based on origional data provided by OP.
df <- data.frame(SID=sample(1:4,15,replace=T), Var1=c(rep("A",5),rep("B",5),rep("C",5)), Var2=sample(2:4,15,replace=T))

could not think of a dplyr solution, but here's one with apply
df$Count <- apply(df, 1, function(x) length(unique(df$SID[(df$Var1 != x['Var1']) & (df$Var2 == x['Var2'])])))
# SID Var1 Var2 Count
# 1 4 A 2 3
# 2 3 A 2 3
# 3 4 A 3 1
# 4 3 A 3 1
# 5 1 A 4 2
# 6 1 B 2 3
# 7 3 B 2 3
# 8 4 B 4 3
# 9 4 B 4 3
# 10 3 B 2 3
# 11 2 C 2 3
# 12 2 C 2 3
# 13 4 C 4 2
# 14 2 C 4 2
# 15 3 C 3 2

Here is a dplyr solution, as requested. For future reference, please use set.seed so we can reproduce your desired output with sample, else I have to enter data by hand...
I think this is your logic? You want the n_distinct(SID) for each Var2, but for each row, you want to exclude rows which have the same Var1 as the current row. So a key observation here is row 3, where a simple grouped summarise would yield a count of 2. Of the rows with Var2 = 3, row 3 has SID = 4, row 4 has SID = 3, row 15 has SID = 3, but we don't count row 3 or row 4, so final count is one unique SID.
Here we get first the count of unique SID for each Var2, then the count of unique SID for each Var1, Var2 combo. First count is too large by the amount of additional unique SID for each combo, so we subtract it and add one. There is an edge case where for a Var1, there is only one corresponding Var2. This should return 0 since you exclude all the possible values of SID. I added two rows to illustrate this.
library(tidyverse)
df <- read_table2(
"SID Var1 Var2
4 A 2
3 A 2
4 A 3
3 A 3
1 A 4
1 B 2
3 B 2
4 B 4
4 B 4
3 B 2
2 C 2
2 C 2
4 C 4
2 C 4
3 C 3
1 D 5
2 D 5"
)
df %>%
group_by(Var2) %>%
mutate(SID_per_Var2 = n_distinct(SID)) %>%
group_by(Var1, Var2) %>%
mutate(SID_per_Var1Var2 = n_distinct(SID)) %>%
ungroup() %>%
add_count(Var1) %>%
add_count(Var1, Var2) %>%
mutate(
Count.Excluding.Var1 = if_else(
n > nn,
SID_per_Var2 - SID_per_Var1Var2 + 1,
0
)
) %>%
select(SID, Var1, Var2, Count.Excluding.Var1)
#> # A tibble: 17 x 4
#> SID Var1 Var2 Count.Excluding.Var1
#> <int> <chr> <int> <dbl>
#> 1 4 A 2 3.
#> 2 3 A 2 3.
#> 3 4 A 3 1.
#> 4 3 A 3 1.
#> 5 1 A 4 3.
#> 6 1 B 2 3.
#> 7 3 B 2 3.
#> 8 4 B 4 3.
#> 9 4 B 4 3.
#> 10 3 B 2 3.
#> 11 2 C 2 4.
#> 12 2 C 2 4.
#> 13 4 C 4 2.
#> 14 2 C 4 2.
#> 15 3 C 3 2.
#> 16 1 D 5 0.
#> 17 2 D 5 0.
Created on 2018-04-12 by the reprex package (v0.2.0).

Here's a solution using purrr - you can wrap this in a mutate statement if you want, but I don't know that it adds much in this particular case.
library(purrr)
df$Count.Excluding.Var1 = map_int(1:nrow(df), function(n) {
df %>% filter(Var2 == Var2[n], Var1 != Var1[n]) %>% distinct() %>% nrow()
})
(Updated with input from comments by Calum You. Thanks!)

A 100% tidyverse solution:
library(tidyverse) # dplyr + purrr
df %>%
group_by(Var2) %>%
mutate(count = map_int(Var1,~n_distinct(SID[.x!=Var1],Var1[.x!=Var1])))
# # A tibble: 15 x 4
# # Groups: Var2 [3]
# SID Var1 Var2 count
# <int> <chr> <int> <int>
# 1 4 A 2 3
# 2 3 A 2 3
# 3 4 A 3 1
# 4 3 A 3 1
# 5 1 A 4 3
# 6 1 B 2 3
# 7 3 B 2 3
# 8 4 B 4 3
# 9 4 B 4 3
# 10 3 B 2 3
# 11 2 C 2 4
# 12 2 C 2 4
# 13 4 C 4 2
# 14 2 C 4 2
# 15 3 C 3 2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Create variable with conditions on other multiple variables - r

Related

Creating a new factor variable from multiple factor variables, all with same levels

How to mutate multiple columns with dynamic variable using purrr:map function?

Create new column based on condition from other column per group using tidy evaluation

convert lists of vectors in just one tibble data frame

Count distinct values that are not the same as the current row's values

Categories

Resources