Adding constant to dataframe column conditional on other column - r

I want to add a constant to rows of a new column that match a certain condition in another column.
My simulated data:
df <- structure(list(var1 = c("a", "b", "c", "a", "a", "a", "a", "d"),
var2 = c("b", "b", "a", "b", "b", "c", "a", "c"),
var2 = c("c", "c", "c", "c", "d", "c", "c", "a")),
.Names = c("var1", "var2", "var3"),
row.names = c(NA, 8L),
class = "data.frame")
which looks like this:
> df
var1 var2 var3
1 a b c
2 b b c
3 c a c
4 a b c
5 a b d
6 a c c
7 a a c
8 d c a
Now I would like to add a newvar that increases by a value of 1 if var1 equals a, increase it further by 1 if var2 equals b and increase it further by 1 if var3 equals c. That is, my data should look like:
> df
var1 var2 var3 newvar
1 a b c 3
2 b b c 2
3 c a c 1
4 a b c 3
5 a b d 2
6 a c c 2
7 a a c 2
8 d c a 0
I have tried the following, but it will only replace the values with 1, not increase them by 1:
df$newvar[df$var1 == "a"] <- +1
df$newvar[df$var1 == "b"] <- +1
df$newvar[df$var1 == "c"] <- +1

We can use rowwise in dplyr and count the number of conditions that are satisfied for each row.
library(dplyr)
df %>%
rowwise() %>%
mutate(new_var = sum(c(var1 == "a", var2 == "b" , var3 == "c")))
# var1 var2 var3 new_var
# <chr> <chr> <chr> <int>
#1 a b c 3
#2 b b c 2
#3 c a c 1
#4 a b c 3
#5 a b d 2
#6 a c c 2
#7 a a c 2
#8 d c a 0
Or base R method
df$new_var <- Reduce("+", list(df$var1 == "a", df$var2 == "b", df$var3 == "c"))

A quick way following your path and using base R is:
df$newVar = 0
df$newVar[df$var1 == "a"] <- df$newVar[df$var1 == "a"] +1
df$newVar[df$var2 == "b"] <- df$newVar[df$var2 == "b"] +1
df$newVar[df$var3 == "c"] <- df$newVar[df$var3 == "c"] +1

Another way that uses ifelse and mutate instead of the rowwise solution above would be:
library(dplyr)
df %>% mutate(newVar = ifelse(var1 == "a",1,0) + ifelse(var2 == "b",1,0) +
ifelse(var3 == "c",1,0))
Then you can adjust the constants to any value you like. If you want to include the new column in your dataframe just assign the result of mutate to your dataframe:
df <- df %>%
mutate(newVar = ifelse(var1 == "a",1,0) + ifelse(var2 ==
"b",1,0) + ifelse(var3 == "c",1,0))

We can use rowSums
df$newVar <- rowSums(df == c('a', 'b', 'c')[col(df)])
df$newVar
#[1] 3 2 1 3 2 2 2 0

Related

I´m looking for a way to inverse the values of 2 columns of a row when this inverse exists in the dataframe?

Here is my dataframe:
DF <- data.frame(
VAR1 = c("A", "A", "B", "B", "B", "C", "C"),
VAR2 = c("B", "C", "A", "D", "C", "B", "D"),
VAR3 = c(1, 1, 1, 2, 4, 6, 4)
)
I would like to have this:
VAR1 VAR2 VAR3
A B 2
A C 1
B D 2
B C 10
C D 4
If There is two rows like (VAR1=A, VAR2=B, VAR3=X) and (VAR2=B, VAR1=A, VAR3=Y), I want to have one row like this one (VAR1=A, VAR2=B, VAR3=X+Y). So if the two first variables are "inverse", I would like to have one row with the sum of them.
I tried to have a column which says "Yes" if two rows have inverse values but I can´t find a way to do it.
My code:
DF <- DF %>%
mutate(VAR4 = case_when(VAR2 %in% DF$VAR1 &
VAR1 %in%
(DF %>%
filter(VAR1 == VAR2) %>%
pull(VAR2)
) ~ "Yes",
TRUE ~ 'No' ))
`
This is the result:
VAR1 VAR2 VAR3 VAR4
A B 1 No
A C 1 No
B A 1 No
B D 2 No
B C 4 No
C B 6 No
C D 4 No
My code doesn´t work because my filter doesn´t take the result of VAR2 %in% DF$VAR1 in account.
Does someone have an idea?
You can sort first with apply, and then summarise:
DF[1:2] <- t(apply(DF[1:2], 1, sort))
DF %>%
group_by(VAR1, VAR2) %>%
summarise(VAR3 = sum(VAR3))
# A tibble: 5 × 3
# Groups: VAR1 [3]
VAR1 VAR2 VAR3
<chr> <chr> <dbl>
1 A B 2
2 A C 1
3 B C 10
4 B D 2
5 C D 4
Or, in single pipe:
DF %>%
mutate(VAR = pmap(., ~ sort(c(..1, ..2)) %>%
set_names(., c("VAR1", "VAR2")))) %>%
group_by(VAR) %>%
summarise(VAR3 = sum(VAR3)) %>%
unnest_wider(VAR)
You could try:
library(dplyr)
DF %>%
mutate(across(VAR1:VAR2, as.character)) %>%
group_by(idx1 = pmin(VAR1, VAR2), idx2 = pmax(VAR1, VAR2)) %>%
summarise(VAR3 = sum(VAR3)) %>%
rename_with(~ sub('idx', 'VAR', .)) %>%
ungroup
Output:
# A tibble: 5 x 3
VAR1 VAR2 VAR3
<chr> <chr> <dbl>
1 A B 2
2 A C 1
3 B C 10
4 B D 2
5 C D 4

how to add a column to identify specific combination of values in R?

I have a database with several columns ( >20) and 2 of these columns have the subject names. I would like to add another column with inside a number that identifies the combination of the two subjects.
Here is an example with only the 2 columns of names (I don't include the others for convenience):
ID1 ID2
A B
A C
A B
B C
A B
B A
C B
And here is what i would like to create:
ID1 ID2 CODE
A B 1
A C 2
A B 1
B C 3
A B 1
B A 1
C B 3
I am kind of new in R and I think it can be done with stringr but I am not sure how
Thanks for the help!
Simo
df$CODE <- as.integer(
factor(
apply(df, 1, function(x) paste0(sort(x), collapse = ""))
)
)
# ID1 ID2 CODE
# 1 A B 1
# 2 A C 2
# 3 A B 1
# 4 B C 3
# 5 A B 1
# 6 B A 1
# 7 C B 3
Data
df <- data.frame(
ID1 = c("A", "A", "A", "B", "A", "B", "C"),
ID2 = c("B", "C", "B", "C", "B", "A", "B")
)
Try this:
library(dplyr)
#Code
new <- df %>% rowwise() %>%
mutate(Var = paste0(sort(c(ID1, ID2)), collapse = '')) %>%
group_by(Var) %>%
mutate(CODE=cur_group_id()) %>%
ungroup() %>%
select(-Var)
Output:
# A tibble: 7 x 3
ID1 ID2 CODE
<chr> <chr> <int>
1 A B 1
2 A C 2
3 A B 1
4 B C 3
5 A B 1
6 B A 1
7 C B 3
Some data used:
#Data
df <- structure(list(ID1 = c("A", "A", "A", "B", "A", "B", "C"), ID2 = c("B",
"C", "B", "C", "B", "A", "B")), class = "data.frame", row.names = c(NA,
-7L))

Count number of element for each row in a matrix [duplicate]

This question already has answers here:
Count number of values in row using dplyr
(5 answers)
Counting number of instances of a condition per row R [duplicate]
(1 answer)
Closed 2 years ago.
Hello I have a matrix such as :
COL1 COL2 COL3
A "A" "B" NA
B "B" "B" "C"
C NA NA NA
D "B" "B" "B"
E NA NA "C"
F "A" "A" "C"
and I would liek for each row (A,B,C,D etc) get the number of letters being A or B
exemple :
Nb
A 2
B 2
C 0
D 3
E 0
F 2
does someone have an idea ?
another way is to use sapply:
df$n <- sapply(1:nrow(df), function(i) sum((df[i,] %in% c('A', 'B'))))
# COL1 COL2 COL3 n
# A A B <NA> 2
# B B B C 2
# C <NA> <NA> <NA> 0
# D B B B 3
# E <NA> <NA> C 0
# F A A C 2
You can achieve the same output by using purrr::map_dbl as well. Just replace sapply with map_dbl.
You can try a base R solution with apply():
#Base R
df$Var <- apply(df,1,function(x) length(which(!is.na(x) & x %in% c('A','B'))))
Output:
COL1 COL2 COL3 Var
A A B <NA> 2
B B B C 2
C <NA> <NA> <NA> 0
D B B B 3
E <NA> <NA> C 0
F A A C 2
Some data used:
#Data
df <- structure(list(COL1 = c("A", "B", NA, "B", NA, "A"), COL2 = c("B",
"B", NA, "B", NA, "A"), COL3 = c(NA, "C", NA, "B", "C", "C")), row.names = c("A",
"B", "C", "D", "E", "F"), class = "data.frame")
Or if you feel curious about tidyverse:
library(tidyverse)
#Code
df %>% mutate(id=1:n()) %>%
left_join(df %>% mutate(id=1:n()) %>%
pivot_longer(cols = -id) %>%
filter(value %in% c('A','B')) %>%
group_by(id) %>%
summarise(Var=n())) %>% ungroup() %>%
replace(is.na(.),0) %>% select(-id)
Output:
COL1 COL2 COL3 Var
1 A B 0 2
2 B B C 2
3 0 0 0 0
4 B B B 3
5 0 0 C 0
6 A A C 2
library(dplyr)
df <- structure(list(COL1 = c("A", "B", NA, "B", NA, "A"), COL2 = c("B",
"B", NA, "B", NA, "A"), COL3 = c(NA, "C", NA, "B", "C", "C")), row.names = c("A",
"B", "C", "D", "E", "F"), class = "data.frame")
df %>%
rowwise() %>%
mutate(sumVar = across(c(COL1:COL3),~ifelse(. %in% c("A", "B"),1,0)) %>% sum)
# A tibble: 6 x 4
# Rowwise:
COL1 COL2 COL3 sumVar
<chr> <chr> <chr> <dbl>
1 A B NA 2
2 B B C 2
3 NA NA NA 0
4 B B B 3
5 NA NA C 0
6 A A C 2

How to apply a single condition on a sequence of multiple columns to create a single column in

I have a dataset which is similar to the following:
Age Monday Tuesday Wednesday
6-9 a b a
6-9 b b c
6-9 c a
9-10 c c b
9-10 c a b
Using R, I want to a binary variable which represents whether the entire row contains "a" or not (1 as entire a and 0 as not) as the following:
Age Monday Tuesday Wednesday Entire a
6-9 a a 1
6-9 b b c 0
6-9 c a 0
9-10 c c b 0
9-10 a a a 1
Note: My data also contains missing values in the rows. The columns which are my interest are of "Factors".
I use the following coding which however did not work:
L <- dataframe %>%
select(Age,Monday:Wednesday) %>%
mutate (Entire a = ifelse(c(Monday:Wednesday)=="a",1,0,na.rm=TRUE))
I'd go with dplyr solution:
library(dplyr)
my.data <- data.frame(
age = c("6-9", "6-9", "6-9", "9-10", "9-10", "9-10"),
Monday = c("a", "b", NA, "c", "a", "a"),
Tuesday = c("a", "b", "a", "c", "a", NA),
Wednesday = c("a", "c", "a", "c", "a", NA)
)
my.data %>%
mutate(
`Entire a` = apply(.[, 2:4], 1, function(x) all(x == "a", na.rm = T) %>% as.numeric)
)
# age Monday Tuesday Wednesday Entire a
# 1 6-9 a a a 1
# 2 6-9 b b c 0
# 3 6-9 <NA> a a 1
# 4 9-10 c c c 0
# 5 9-10 a a a 1
# 6 9-10 a <NA> <NA> 1
The na.rm argument within all() function will control whether You will ignore missing values.
We could create a logical matrix with == and get the rowSums to convert to binary
colnm <- names(dataframe)[-1]
dataframe$Entire_a <- +(rowSums(replace(dataframe[colnm],
dataframe[colnm] == '', 'a') == 'a') == length(colnm))
dataframe$Entire_a
#[1] 1 0 0 0 1
Or another option is to paste and then use grep
+(grepl("^a+$", do.call(paste, c(dataframe[colnm], sep=""))))
#[1] 1 0 0 0 1
If the missing value is NA and not blank (''), then use
+(rowSums(replace(dataframe[colnm], is.na(dataframe[colnm]), 'a') == 'a') == 3)
data
dataframe <- structure(list(Age = c("6-9", "6-9", "6-9", "9-10", "9-10"),
Monday = c("a", "b", "", "c", "a"), Tuesday = c("", "b",
"c", "c", "a"), Wednesday = c("a", "c", "a", "b", "a")),
row.names = c(NA,
-5L), class = "data.frame")
We can use pmap_int from purrr for this row-wise operation.
Turn empty values ('') to NA if they are not already.
library(dplyr)
library(purrr)
dataframe %>%
na_if('') %>%
mutate(Entire_a = pmap_int(select(., Monday:Wednesday),
~+all(c(...) == 'a', na.rm = TRUE)))
# Age Monday Tuesday Wednesday Entire_a
#1 6-9 a <NA> a 1
#2 6-9 b b c 0
#3 6-9 <NA> c a 0
#4 9-10 c c b 0
#5 9-10 a a a 1

R: unique combination (avoid a-b and b-a and identical such as a-a, b-b)

I have the following variable columns -
var1 <- c("a", "b", "a", "a", "c", "a", "b", "b", "c", "b", "c", "c", "d")
var2 <- c("a", "a", "b", "c", "a", "d", "b", "c", "b", "d", "c", "d", "d")
mydf <- data.frame(var1, var2)
I want to find unique variable combination, such that
(a) var1 a- var2 b and var1 b- var2 a are not considered unique.
(b) no identical combination are present -
for example var1 a and var2 a, var1 b and var2 b
I used the following codes, is not providing what I am expecting:
unique(mydf)
var1 var2
1 a a
2 b a
3 a b
4 a c
5 c a
6 a d
7 b b
8 b c
9 c b
10 b d
11 c c
12 c d
13 d d
My expected output is:
var1 var2
1 a b
2 a c
3 a d
4 b c
5 b d
6 c d
thanks;
This should do it:
mydf = mydf[mydf[,1] != mydf[,2], ]
mydf = mydf[!duplicated(data.frame(t(apply(mydf, 1, sort)))), ]
> mydf
var1 var2
2 b a
4 a c
6 a d
8 b c
10 b d
12 c d
More of an exercise to teach myself some sets package behavior:
require(sets)
mydf <- data.frame(var1, var2, stringsAsFactors=FALSE) # unneeded factors are a plague on R/S
dlis <- list();
for (i in seq(nrow(mydf)) ) {
if( length(set(mydf[i,1], mydf[i,2]) )==2 ) {
dlis <- c( dlis, list(set(mydf[i,1], mydf[i,2]))
) } }
unique(dlis)
[[1]]
{"a", "b"}
[[2]]
{"a", "c"}
[[3]]
{"a", "d"}
[[4]]
{"b", "c"}
[[5]]
{"b", "d"}
[[6]]
{"c", "d"}

Resources