This question already has answers here:
Count number of values in row using dplyr
(5 answers)
Counting number of instances of a condition per row R [duplicate]
(1 answer)
Closed 2 years ago.
Hello I have a matrix such as :
COL1 COL2 COL3
A "A" "B" NA
B "B" "B" "C"
C NA NA NA
D "B" "B" "B"
E NA NA "C"
F "A" "A" "C"
and I would liek for each row (A,B,C,D etc) get the number of letters being A or B
exemple :
Nb
A 2
B 2
C 0
D 3
E 0
F 2
does someone have an idea ?
another way is to use sapply:
df$n <- sapply(1:nrow(df), function(i) sum((df[i,] %in% c('A', 'B'))))
# COL1 COL2 COL3 n
# A A B <NA> 2
# B B B C 2
# C <NA> <NA> <NA> 0
# D B B B 3
# E <NA> <NA> C 0
# F A A C 2
You can achieve the same output by using purrr::map_dbl as well. Just replace sapply with map_dbl.
You can try a base R solution with apply():
#Base R
df$Var <- apply(df,1,function(x) length(which(!is.na(x) & x %in% c('A','B'))))
Output:
COL1 COL2 COL3 Var
A A B <NA> 2
B B B C 2
C <NA> <NA> <NA> 0
D B B B 3
E <NA> <NA> C 0
F A A C 2
Some data used:
#Data
df <- structure(list(COL1 = c("A", "B", NA, "B", NA, "A"), COL2 = c("B",
"B", NA, "B", NA, "A"), COL3 = c(NA, "C", NA, "B", "C", "C")), row.names = c("A",
"B", "C", "D", "E", "F"), class = "data.frame")
Or if you feel curious about tidyverse:
library(tidyverse)
#Code
df %>% mutate(id=1:n()) %>%
left_join(df %>% mutate(id=1:n()) %>%
pivot_longer(cols = -id) %>%
filter(value %in% c('A','B')) %>%
group_by(id) %>%
summarise(Var=n())) %>% ungroup() %>%
replace(is.na(.),0) %>% select(-id)
Output:
COL1 COL2 COL3 Var
1 A B 0 2
2 B B C 2
3 0 0 0 0
4 B B B 3
5 0 0 C 0
6 A A C 2
library(dplyr)
df <- structure(list(COL1 = c("A", "B", NA, "B", NA, "A"), COL2 = c("B",
"B", NA, "B", NA, "A"), COL3 = c(NA, "C", NA, "B", "C", "C")), row.names = c("A",
"B", "C", "D", "E", "F"), class = "data.frame")
df %>%
rowwise() %>%
mutate(sumVar = across(c(COL1:COL3),~ifelse(. %in% c("A", "B"),1,0)) %>% sum)
# A tibble: 6 x 4
# Rowwise:
COL1 COL2 COL3 sumVar
<chr> <chr> <chr> <dbl>
1 A B NA 2
2 B B C 2
3 NA NA NA 0
4 B B B 3
5 NA NA C 0
6 A A C 2
Related
I have a data frame, which looks like this:
DF_A <- data.frame(
Group_1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
Group_2 = c("A", "B", "C", "A", "B", "A", "B", "A", "C", "A")
)
I would like to assign a consecutive number for Group_1 IDs which should be unique for the case of identical Group_2 IDs. For example, A+A starts with 1, A+B proceeds with 2 (same Group_1 ID, but new Group_2 ID), ..., A+A is again 1 (obviously a repetition). B+A is 1 (new Group_1 ID), ..., B+A (same Group_1 ID, but new Group_2 ID)...and so forth.
The result should look like this.
DF_B <- data.frame(
Group_1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
Group_2 = c("A", "B", "C", "A", "B", "A", "B", "A", "C", "A"),
ID = c(1, 2, 3, 1, 2, 1, 2, 1, 1, 1)
)
I investigated various posts on corresponding approaches such as single groups within groups, or a combination - without any success - this case is not covered by previous posts.
Thank you in advance.
One way to do it with ave is
DF_A$ID <- ave(DF_A$Group_2, DF_A$Group_1, FUN = function(x) match(x, unique(x)))
DF_A
# Group_1 Group_2 ID
#1 A A 1
#2 A B 2
#3 A C 3
#4 A A 1
#5 A B 2
#6 B A 1
#7 B B 2
#8 B A 1
#9 B C 3
#10 C A 1
The equivalent dplyr way is :
library(dplyr)
DF_A %>%
group_by(Group_1) %>%
mutate(ID = match(Group_2, unique(Group_2)))
You can split into groups by Group_1, then create factor out of your combinations within each group then convert into integer
DF_A$ID <- unlist(by(DF_A, DF_A$Group_1, function(x) as.integer(factor(x$Group_2))))
We can use the dense_rank from dplyr.
library(dplyr)
DF_A2 <- DF_A %>%
group_by(Group_1) %>%
mutate(ID = dense_rank(Group_2)) %>%
ungroup()
DF_A2
# # A tibble: 10 x 3
# Group_1 Group_2 ID
# <fct> <fct> <int>
# 1 A A 1
# 2 A B 2
# 3 A C 3
# 4 A A 1
# 5 A B 2
# 6 B A 1
# 7 B B 2
# 8 B A 1
# 9 B C 3
# 10 C A 1
You could use the integer values of the factor levels. We can simply wrap Group_2 in c() to drop the factor attribute.
within(DF_A, { ID = ave(c(Group_2), Group_1, FUN = c) })
# Group_1 Group_2 ID
# 1 A A 1
# 2 A B 2
# 3 A C 3
# 4 A A 1
# 5 A B 2
# 6 B A 1
# 7 B B 2
# 8 B A 1
# 9 B C 3
# 10 C A 1
I have a database with several columns ( >20) and 2 of these columns have the subject names. I would like to add another column with inside a number that identifies the combination of the two subjects.
Here is an example with only the 2 columns of names (I don't include the others for convenience):
ID1 ID2
A B
A C
A B
B C
A B
B A
C B
And here is what i would like to create:
ID1 ID2 CODE
A B 1
A C 2
A B 1
B C 3
A B 1
B A 1
C B 3
I am kind of new in R and I think it can be done with stringr but I am not sure how
Thanks for the help!
Simo
df$CODE <- as.integer(
factor(
apply(df, 1, function(x) paste0(sort(x), collapse = ""))
)
)
# ID1 ID2 CODE
# 1 A B 1
# 2 A C 2
# 3 A B 1
# 4 B C 3
# 5 A B 1
# 6 B A 1
# 7 C B 3
Data
df <- data.frame(
ID1 = c("A", "A", "A", "B", "A", "B", "C"),
ID2 = c("B", "C", "B", "C", "B", "A", "B")
)
Try this:
library(dplyr)
#Code
new <- df %>% rowwise() %>%
mutate(Var = paste0(sort(c(ID1, ID2)), collapse = '')) %>%
group_by(Var) %>%
mutate(CODE=cur_group_id()) %>%
ungroup() %>%
select(-Var)
Output:
# A tibble: 7 x 3
ID1 ID2 CODE
<chr> <chr> <int>
1 A B 1
2 A C 2
3 A B 1
4 B C 3
5 A B 1
6 B A 1
7 C B 3
Some data used:
#Data
df <- structure(list(ID1 = c("A", "A", "A", "B", "A", "B", "C"), ID2 = c("B",
"C", "B", "C", "B", "A", "B")), class = "data.frame", row.names = c(NA,
-7L))
I have a dataframe of the following type
ID case1 case2 case3 case4
1 A B C D
2 B A
3 E F
4 G C A
5 T
I need to change its format, to a long shape, similar as the below:
ID col1 col2
1 A B
1 A C
1 A D
1 B C
1 B D
1 C D
2 B A
3 E F
4 G C
4 G A
4 C A
5 T
As you can see, I need to maintain the ID and ignore empty columns. There are some cases like T that need to remain in the dataset, but without a col2.
I am honestly not sure how to approach this, so that is why there are no examples of what I have tried.
You can get the data in long format and create all combination of values for each ID if the number of rows is greater than 1 in that ID.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -ID, values_drop_na = TRUE) %>%
group_by(ID) %>%
summarise(value = if(n() > 1) list(setNames(as.data.frame(t(combn(value, 2))),
c('col1', 'col2')))
else list(data.frame(col1 = value[1], col2 = NA_character_))) %>%
unnest(value)
# A tibble: 12 x 3
# ID col1 col2
# <int> <chr> <chr>
# 1 1 A B
# 2 1 A C
# 3 1 A D
# 4 1 B C
# 5 1 B D
# 6 1 C D
# 7 2 B A
# 8 3 E F
# 9 4 G C
#10 4 G A
#11 4 C A
#12 5 T NA
data
df <- structure(list(ID = 1:5, case1 = c("A", "B", "E", "G", "T"),
case2 = c("B", "A", "F", "C", NA), case3 = c("C", NA, NA,
"A", NA), case4 = c("D", NA, NA, NA, NA)),
class = "data.frame", row.names = c(NA, -5L))
I want to add a constant to rows of a new column that match a certain condition in another column.
My simulated data:
df <- structure(list(var1 = c("a", "b", "c", "a", "a", "a", "a", "d"),
var2 = c("b", "b", "a", "b", "b", "c", "a", "c"),
var2 = c("c", "c", "c", "c", "d", "c", "c", "a")),
.Names = c("var1", "var2", "var3"),
row.names = c(NA, 8L),
class = "data.frame")
which looks like this:
> df
var1 var2 var3
1 a b c
2 b b c
3 c a c
4 a b c
5 a b d
6 a c c
7 a a c
8 d c a
Now I would like to add a newvar that increases by a value of 1 if var1 equals a, increase it further by 1 if var2 equals b and increase it further by 1 if var3 equals c. That is, my data should look like:
> df
var1 var2 var3 newvar
1 a b c 3
2 b b c 2
3 c a c 1
4 a b c 3
5 a b d 2
6 a c c 2
7 a a c 2
8 d c a 0
I have tried the following, but it will only replace the values with 1, not increase them by 1:
df$newvar[df$var1 == "a"] <- +1
df$newvar[df$var1 == "b"] <- +1
df$newvar[df$var1 == "c"] <- +1
We can use rowwise in dplyr and count the number of conditions that are satisfied for each row.
library(dplyr)
df %>%
rowwise() %>%
mutate(new_var = sum(c(var1 == "a", var2 == "b" , var3 == "c")))
# var1 var2 var3 new_var
# <chr> <chr> <chr> <int>
#1 a b c 3
#2 b b c 2
#3 c a c 1
#4 a b c 3
#5 a b d 2
#6 a c c 2
#7 a a c 2
#8 d c a 0
Or base R method
df$new_var <- Reduce("+", list(df$var1 == "a", df$var2 == "b", df$var3 == "c"))
A quick way following your path and using base R is:
df$newVar = 0
df$newVar[df$var1 == "a"] <- df$newVar[df$var1 == "a"] +1
df$newVar[df$var2 == "b"] <- df$newVar[df$var2 == "b"] +1
df$newVar[df$var3 == "c"] <- df$newVar[df$var3 == "c"] +1
Another way that uses ifelse and mutate instead of the rowwise solution above would be:
library(dplyr)
df %>% mutate(newVar = ifelse(var1 == "a",1,0) + ifelse(var2 == "b",1,0) +
ifelse(var3 == "c",1,0))
Then you can adjust the constants to any value you like. If you want to include the new column in your dataframe just assign the result of mutate to your dataframe:
df <- df %>%
mutate(newVar = ifelse(var1 == "a",1,0) + ifelse(var2 ==
"b",1,0) + ifelse(var3 == "c",1,0))
We can use rowSums
df$newVar <- rowSums(df == c('a', 'b', 'c')[col(df)])
df$newVar
#[1] 3 2 1 3 2 2 2 0
I have a data frame, which looks like this:
DF_A <- data.frame(
Group_1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
Group_2 = c("A", "B", "C", "A", "B", "A", "B", "A", "C", "A")
)
I would like to assign a consecutive number for Group_1 IDs which should be unique for the case of identical Group_2 IDs. For example, A+A starts with 1, A+B proceeds with 2 (same Group_1 ID, but new Group_2 ID), ..., A+A is again 1 (obviously a repetition). B+A is 1 (new Group_1 ID), ..., B+A (same Group_1 ID, but new Group_2 ID)...and so forth.
The result should look like this.
DF_B <- data.frame(
Group_1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
Group_2 = c("A", "B", "C", "A", "B", "A", "B", "A", "C", "A"),
ID = c(1, 2, 3, 1, 2, 1, 2, 1, 1, 1)
)
I investigated various posts on corresponding approaches such as single groups within groups, or a combination - without any success - this case is not covered by previous posts.
Thank you in advance.
One way to do it with ave is
DF_A$ID <- ave(DF_A$Group_2, DF_A$Group_1, FUN = function(x) match(x, unique(x)))
DF_A
# Group_1 Group_2 ID
#1 A A 1
#2 A B 2
#3 A C 3
#4 A A 1
#5 A B 2
#6 B A 1
#7 B B 2
#8 B A 1
#9 B C 3
#10 C A 1
The equivalent dplyr way is :
library(dplyr)
DF_A %>%
group_by(Group_1) %>%
mutate(ID = match(Group_2, unique(Group_2)))
You can split into groups by Group_1, then create factor out of your combinations within each group then convert into integer
DF_A$ID <- unlist(by(DF_A, DF_A$Group_1, function(x) as.integer(factor(x$Group_2))))
We can use the dense_rank from dplyr.
library(dplyr)
DF_A2 <- DF_A %>%
group_by(Group_1) %>%
mutate(ID = dense_rank(Group_2)) %>%
ungroup()
DF_A2
# # A tibble: 10 x 3
# Group_1 Group_2 ID
# <fct> <fct> <int>
# 1 A A 1
# 2 A B 2
# 3 A C 3
# 4 A A 1
# 5 A B 2
# 6 B A 1
# 7 B B 2
# 8 B A 1
# 9 B C 3
# 10 C A 1
You could use the integer values of the factor levels. We can simply wrap Group_2 in c() to drop the factor attribute.
within(DF_A, { ID = ave(c(Group_2), Group_1, FUN = c) })
# Group_1 Group_2 ID
# 1 A A 1
# 2 A B 2
# 3 A C 3
# 4 A A 1
# 5 A B 2
# 6 B A 1
# 7 B B 2
# 8 B A 1
# 9 B C 3
# 10 C A 1