Removing a repeated value in a row - r

I have two columns in a data frame that may or may not have copied values in them. If the second column has the same value as the first column, I would like to replace that value with a NULL value or a string indicating the value has been replaced. If the values are different, I want to keep both of those values. For example:
I want to take this
col_1 col_2
a a
a b
b d
c c
c d
c c
a a
And turn this into:
col_1 col_2
a NULL
a b
b d
c NULL
c d
c NULL
a NULL
How can I do that?

You can also try:
#Code
df$col_2 <- ifelse(df$col_2==df$col_1,'NULL',df$col_2)
Output:
df
col_1 col_2
1 a NULL
2 a b
3 b d
4 c NULL
5 c d
Some data used:
#Data
df <- structure(list(col_1 = c("a", "a", "b", "c", "c"), col_2 = c("a",
"b", "d", "c", "d")), class = "data.frame", row.names = c(NA,
-5L))
Another option can be, using correct R sintax:
#Code2
df$col_2[df$col_2==df$col_1]<-'NULL'
Same output.
Using the ifelse() approach, we get this:
df
col_1 col_2
1 a NULL
2 a b
3 b d
4 c NULL
5 c d
6 c NULL
7 a NULL

By NULL value, I assume you need NA, if you need actual string NULL, you can use 'NULL' in place of NA_character_ as in Duck's answer.
library(dplyr)
df %>%
mutate(col_2 = case_when(col_1 == col_2 ~ NA_character_, TRUE ~ col_2))
# A tibble: 5 x 2
# Rowwise:
col_1 col_2
<chr> <chr>
1 a NA
2 a b
3 b d
4 c NA
5 c d
Based on new input:
df %>% mutate(col_2 = case_when(col_1 == col_2 ~ NA_character_, TRUE ~ col_2))
# A tibble: 7 x 2
# Rowwise:
col_1 col_2
<chr> <chr>
1 a NA
2 a b
3 b d
4 c NA
5 c d
6 c NA
7 a NA
Data used:
df
# A tibble: 7 x 2
col_1 col_2
<chr> <chr>
1 a a
2 a b
3 b d
4 c c
5 c d
6 c c
7 a a

We can use data.table methods which is fast and efficient
library(data.table)
setDT(df)[col_1 == col_2, col_2 := 'NULL']
-output
df
# col_1 col_2
#1: a NULL
#2: a b
#3: b d
#4: c NULL
#5: c d
data
df <- structure(list(col_1 = c("a", "a", "b", "c", "c"), col_2 = c("a",
"b", "d", "c", "d")), class = "data.frame", row.names = c(NA,
-5L))

Related

Change value in one column if a value in another is in a list [R]

Hello I have a df such as :
COL1 COL2
A 1
B 2
C 3
D 4
E 5
F 6
List<-c("A","C")
and if a COL1 value is in List, then add "OK" into the COL2
I should then get:
COL1 COL2
A OK
B 2
C OK
D 4
E 5
F 6
Here are the data
structure(list(COL1 = structure(1:6, .Label = c("A", "B", "C",
"D", "E", "F"), class = "factor"), COL2 = 1:6), class = "data.frame", row.names = c(NA,
-6L))
You can use %in% + replace like below
transform(
df,
COL2 = replace(COL2, COL1 %in% List, "OK")
)
which gives
COL1 COL2
1 A OK
2 B 2
3 C OK
4 D 4
5 E 5
6 F 6
A dplyr option
> df %>%
+ mutate_at("COL2", ~ replace(., COL1 %in% List, "OK"))
COL1 COL2
1 A OK
2 B 2
3 C OK
4 D 4
5 E 5
6 F 6

how to add a column to identify specific combination of values in R?

I have a database with several columns ( >20) and 2 of these columns have the subject names. I would like to add another column with inside a number that identifies the combination of the two subjects.
Here is an example with only the 2 columns of names (I don't include the others for convenience):
ID1 ID2
A B
A C
A B
B C
A B
B A
C B
And here is what i would like to create:
ID1 ID2 CODE
A B 1
A C 2
A B 1
B C 3
A B 1
B A 1
C B 3
I am kind of new in R and I think it can be done with stringr but I am not sure how
Thanks for the help!
Simo
df$CODE <- as.integer(
factor(
apply(df, 1, function(x) paste0(sort(x), collapse = ""))
)
)
# ID1 ID2 CODE
# 1 A B 1
# 2 A C 2
# 3 A B 1
# 4 B C 3
# 5 A B 1
# 6 B A 1
# 7 C B 3
Data
df <- data.frame(
ID1 = c("A", "A", "A", "B", "A", "B", "C"),
ID2 = c("B", "C", "B", "C", "B", "A", "B")
)
Try this:
library(dplyr)
#Code
new <- df %>% rowwise() %>%
mutate(Var = paste0(sort(c(ID1, ID2)), collapse = '')) %>%
group_by(Var) %>%
mutate(CODE=cur_group_id()) %>%
ungroup() %>%
select(-Var)
Output:
# A tibble: 7 x 3
ID1 ID2 CODE
<chr> <chr> <int>
1 A B 1
2 A C 2
3 A B 1
4 B C 3
5 A B 1
6 B A 1
7 C B 3
Some data used:
#Data
df <- structure(list(ID1 = c("A", "A", "A", "B", "A", "B", "C"), ID2 = c("B",
"C", "B", "C", "B", "A", "B")), class = "data.frame", row.names = c(NA,
-7L))

Count number of element for each row in a matrix [duplicate]

This question already has answers here:
Count number of values in row using dplyr
(5 answers)
Counting number of instances of a condition per row R [duplicate]
(1 answer)
Closed 2 years ago.
Hello I have a matrix such as :
COL1 COL2 COL3
A "A" "B" NA
B "B" "B" "C"
C NA NA NA
D "B" "B" "B"
E NA NA "C"
F "A" "A" "C"
and I would liek for each row (A,B,C,D etc) get the number of letters being A or B
exemple :
Nb
A 2
B 2
C 0
D 3
E 0
F 2
does someone have an idea ?
another way is to use sapply:
df$n <- sapply(1:nrow(df), function(i) sum((df[i,] %in% c('A', 'B'))))
# COL1 COL2 COL3 n
# A A B <NA> 2
# B B B C 2
# C <NA> <NA> <NA> 0
# D B B B 3
# E <NA> <NA> C 0
# F A A C 2
You can achieve the same output by using purrr::map_dbl as well. Just replace sapply with map_dbl.
You can try a base R solution with apply():
#Base R
df$Var <- apply(df,1,function(x) length(which(!is.na(x) & x %in% c('A','B'))))
Output:
COL1 COL2 COL3 Var
A A B <NA> 2
B B B C 2
C <NA> <NA> <NA> 0
D B B B 3
E <NA> <NA> C 0
F A A C 2
Some data used:
#Data
df <- structure(list(COL1 = c("A", "B", NA, "B", NA, "A"), COL2 = c("B",
"B", NA, "B", NA, "A"), COL3 = c(NA, "C", NA, "B", "C", "C")), row.names = c("A",
"B", "C", "D", "E", "F"), class = "data.frame")
Or if you feel curious about tidyverse:
library(tidyverse)
#Code
df %>% mutate(id=1:n()) %>%
left_join(df %>% mutate(id=1:n()) %>%
pivot_longer(cols = -id) %>%
filter(value %in% c('A','B')) %>%
group_by(id) %>%
summarise(Var=n())) %>% ungroup() %>%
replace(is.na(.),0) %>% select(-id)
Output:
COL1 COL2 COL3 Var
1 A B 0 2
2 B B C 2
3 0 0 0 0
4 B B B 3
5 0 0 C 0
6 A A C 2
library(dplyr)
df <- structure(list(COL1 = c("A", "B", NA, "B", NA, "A"), COL2 = c("B",
"B", NA, "B", NA, "A"), COL3 = c(NA, "C", NA, "B", "C", "C")), row.names = c("A",
"B", "C", "D", "E", "F"), class = "data.frame")
df %>%
rowwise() %>%
mutate(sumVar = across(c(COL1:COL3),~ifelse(. %in% c("A", "B"),1,0)) %>% sum)
# A tibble: 6 x 4
# Rowwise:
COL1 COL2 COL3 sumVar
<chr> <chr> <chr> <dbl>
1 A B NA 2
2 B B C 2
3 NA NA NA 0
4 B B B 3
5 NA NA C 0
6 A A C 2

Wide to long, combining columns in pairs but keeping ID column - R

I have a dataframe of the following type
ID case1 case2 case3 case4
1 A B C D
2 B A
3 E F
4 G C A
5 T
I need to change its format, to a long shape, similar as the below:
ID col1 col2
1 A B
1 A C
1 A D
1 B C
1 B D
1 C D
2 B A
3 E F
4 G C
4 G A
4 C A
5 T
As you can see, I need to maintain the ID and ignore empty columns. There are some cases like T that need to remain in the dataset, but without a col2.
I am honestly not sure how to approach this, so that is why there are no examples of what I have tried.
You can get the data in long format and create all combination of values for each ID if the number of rows is greater than 1 in that ID.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -ID, values_drop_na = TRUE) %>%
group_by(ID) %>%
summarise(value = if(n() > 1) list(setNames(as.data.frame(t(combn(value, 2))),
c('col1', 'col2')))
else list(data.frame(col1 = value[1], col2 = NA_character_))) %>%
unnest(value)
# A tibble: 12 x 3
# ID col1 col2
# <int> <chr> <chr>
# 1 1 A B
# 2 1 A C
# 3 1 A D
# 4 1 B C
# 5 1 B D
# 6 1 C D
# 7 2 B A
# 8 3 E F
# 9 4 G C
#10 4 G A
#11 4 C A
#12 5 T NA
data
df <- structure(list(ID = 1:5, case1 = c("A", "B", "E", "G", "T"),
case2 = c("B", "A", "F", "C", NA), case3 = c("C", NA, NA,
"A", NA), case4 = c("D", NA, NA, NA, NA)),
class = "data.frame", row.names = c(NA, -5L))

Adding constant to dataframe column conditional on other column

I want to add a constant to rows of a new column that match a certain condition in another column.
My simulated data:
df <- structure(list(var1 = c("a", "b", "c", "a", "a", "a", "a", "d"),
var2 = c("b", "b", "a", "b", "b", "c", "a", "c"),
var2 = c("c", "c", "c", "c", "d", "c", "c", "a")),
.Names = c("var1", "var2", "var3"),
row.names = c(NA, 8L),
class = "data.frame")
which looks like this:
> df
var1 var2 var3
1 a b c
2 b b c
3 c a c
4 a b c
5 a b d
6 a c c
7 a a c
8 d c a
Now I would like to add a newvar that increases by a value of 1 if var1 equals a, increase it further by 1 if var2 equals b and increase it further by 1 if var3 equals c. That is, my data should look like:
> df
var1 var2 var3 newvar
1 a b c 3
2 b b c 2
3 c a c 1
4 a b c 3
5 a b d 2
6 a c c 2
7 a a c 2
8 d c a 0
I have tried the following, but it will only replace the values with 1, not increase them by 1:
df$newvar[df$var1 == "a"] <- +1
df$newvar[df$var1 == "b"] <- +1
df$newvar[df$var1 == "c"] <- +1
We can use rowwise in dplyr and count the number of conditions that are satisfied for each row.
library(dplyr)
df %>%
rowwise() %>%
mutate(new_var = sum(c(var1 == "a", var2 == "b" , var3 == "c")))
# var1 var2 var3 new_var
# <chr> <chr> <chr> <int>
#1 a b c 3
#2 b b c 2
#3 c a c 1
#4 a b c 3
#5 a b d 2
#6 a c c 2
#7 a a c 2
#8 d c a 0
Or base R method
df$new_var <- Reduce("+", list(df$var1 == "a", df$var2 == "b", df$var3 == "c"))
A quick way following your path and using base R is:
df$newVar = 0
df$newVar[df$var1 == "a"] <- df$newVar[df$var1 == "a"] +1
df$newVar[df$var2 == "b"] <- df$newVar[df$var2 == "b"] +1
df$newVar[df$var3 == "c"] <- df$newVar[df$var3 == "c"] +1
Another way that uses ifelse and mutate instead of the rowwise solution above would be:
library(dplyr)
df %>% mutate(newVar = ifelse(var1 == "a",1,0) + ifelse(var2 == "b",1,0) +
ifelse(var3 == "c",1,0))
Then you can adjust the constants to any value you like. If you want to include the new column in your dataframe just assign the result of mutate to your dataframe:
df <- df %>%
mutate(newVar = ifelse(var1 == "a",1,0) + ifelse(var2 ==
"b",1,0) + ifelse(var3 == "c",1,0))
We can use rowSums
df$newVar <- rowSums(df == c('a', 'b', 'c')[col(df)])
df$newVar
#[1] 3 2 1 3 2 2 2 0

Resources