I have the following df:
df = data.frame(a = c(0,1,0,0,1),
b= c(0,0,0,1,0),
SL = c(1,0,1,0,0))
df2 = data.frame(a = c(NA,1,NA,0,1),
b= c(NA,0,NA,1,0),
SL = c(NA,0,NA,0,0))
Now, I would like to change all values in a row to NA if SL == 1, like in df2. I tried with dplyr --> mutate(), across(), mutate_all but wasn't successful.
An option with dplyr would be
library(dplyr)
df <- df %>%
mutate(across(everything(), ~ case_when(SL != 1 ~ SL)))
df
# a b SL
#1 NA NA NA
#2 0 0 0
#3 NA NA NA
#4 0 0 0
#5 0 0 0
Using %in%.
df[df$SL %in% 1, ] <- NA
df
# a b SL
# 1 NA NA NA
# 2 1 0 0
# 3 NA NA NA
# 4 0 1 0
# 5 1 0 0
Related
I need to assign NA when all the columns are empty in summation for each id.
Here is how my sample dataset looks like;
df <- data.frame(id = c(1,2,3),
i1 = c(1,NA,0),
i2 = c(1,NA,1),
i3 = c(1,NA,0),
total = c(3,0,1))
> df
id i1 i2 i3 total
1 1 1 1 1 3
2 2 NA NA NA 0
3 3 0 1 0 1
For the second id the total should be NA instead of 0 because all the values are NA for the second id. How can I change the dataset to below?
> df1
id i1 i2 i3 total
1 1 1 1 1 3
2 2 NA NA NA NA
3 3 0 1 0 1
We could create a condition with if_all in case_when to return NA when all the column values are NA for a row or else do the rowSums with na.rm = TRUE
library(dplyr)
df %>%
mutate(total = case_when(if_all(i1:i3, is.na) ~ NA_real_,
TRUE ~ rowSums(across(i1:i3), na.rm = TRUE)))
-output
id i1 i2 i3 total
1 1 1 1 1 3
2 2 NA NA NA NA
3 3 0 1 0 1
I have a dataframe with two columns:
df <- data.frame (a = c(NA, 0, NA, NA, NA, NA, 0, 0, NA),
b = c(1, 2, 5, 3, 6, 3, 2, 1, 4))
a b
1 NA 1
2 0 2
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 2
8 0 1
9 NA 4
When the value in column a is 0, I want to replace the value in column b; desired end result is:
a b
1 NA 1
2 0 0
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 0
8 0 0
9 NA 4
I tried various combinations of mutate with ifelse and case_when, and all but one replaces all of column b with column a values, 0 as well as NA.
Failed attemps:
df %>%
mutate(b = case_when(a == 0 ~ 0))
df %>%
mutate(b = case_when(a == 0 ~ 0,
TRUE ~ as.numeric(as.character(a))))
df %>%
mutate(b = ifelse(a==0, a, b))
All result in:
a b
1 NA NA
2 0 0
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 0 0
8 0 0
9 NA NA
After much consternation, I finally found a solution that produces the result I'm after:
df <- df %>%
mutate(b = ifelse(is.na(a), b, a))
a b
1 NA 1
2 0 0
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 0
8 0 0
9 NA 4
But I'm still perplexed as to why the others did not work as expected. Would love some insight here.
Using %in% instead of == can be useful where there are NA values.
In base R the following will give you what you want.
df$b[df$a %in% 0] <- 0
Using this in dplyr is slightly more complicated than base R, but simpler than the previous solutions:
library(dplyr)
df <- df %>% mutate(b = if_else(a %in% 0, 0, b))
The reason for the problems is that NA == 0 gives NA, not FALSE. NA %in% 0 gives FALSE.
A possible solution:
library(dplyr)
df %>%
mutate(b = if_else(a == 0 & !is.na(a), 0, b))
#> a b
#> 1 NA 1
#> 2 0 0
#> 3 NA 5
#> 4 NA 3
#> 5 NA 6
#> 6 NA 3
#> 7 0 0
#> 8 0 0
#> 9 NA 4
In general any operation on an NA becomes an NA, so when comparing vectors that have NA the results will be NA where either of the original items was NA.
If you're willing to eschew dplyr you can do this in base R:
df$b <- ifelse(
is.na(df$a),
df$b,
ifelse(
df$a == 0,
0,
df$b
)
)
This question already has answers here:
Split character column into several binary (0/1) columns
(7 answers)
Closed 2 years ago.
I'm trying to flag if a column's name appears in string vector in the same data frame.
For example, I have a dataframe that looks like the following:
df1 <- data.frame(ID = c('123', '234', '345', '456', '567')
, Types = c('A|B|C|D', 'A|B', 'D|B', 'B|D|C', 'D')
, A = NA
, B = NA
, C = NA
, D = NA)
df1
ID Types A B C D
1 123 A|B|C|D NA NA NA NA
2 234 A|B NA NA NA NA
3 345 D|B NA NA NA NA
4 456 B|D|C NA NA NA NA
5 567 D NA NA NA NA
I'm trying to put a 1 in each column where its name is in the string 'Types' so that the output dataframe looks like
df2 <- data.frame(ID = c('123', '234', '345', '456', '567')
, Types = c('A|B|C|D', 'A|B', 'D|B', 'B|D|C', 'D')
, A = c(1,1,0,0,0)
, B = c(1,1,1,1,0)
, C = c(1,0,0,1,0)
, D = c(1,0,1,1,1))
df2
ID Types A B C D
1 123 A|B|C|D 1 1 1 1
2 234 A|B 1 1 0 0
3 345 D|B 0 1 0 1
4 456 B|D|C 0 1 1 1
5 567 D 0 0 0 1
I was able to do this using this loop
for(j in 3:6)
{
for(i in 1:5)
{
df1[i,j] <- case_when(colnames(df1)[j] %like% df1[i,2] ~ 1, T ~ 0)
}
}
But the actual dataframe I'm using is significantly larger so this loop is very slow. I'm looking for help coming up with a more efficient way of doing this!
Thank you!
We can split the column and use mtabulate
library(qdapTools)
df1[-(1:2)] <- mtabulate(strsplit(df1$Types, "|", fixed = TRUE))
df1
# ID Types A B C D
#1 123 A|B|C|D 1 1 1 1
#2 234 A|B 1 1 0 0
#3 345 D|B 0 1 0 1
#4 456 B|D|C 0 1 1 1
#5 567 D 0 0 0 1
Or using cSplit_e
library(splitstackshape)
cSplit_e(df1[1:2], "Types", "|", type = 'character', fill = 0)
Here is a base R option using strsplit + table + factor
df1[-(1:2)] <- t(sapply(
strsplit(df1$Types, "\\|"),
function(x) table(factor(x, levels = names(df1)[-(1:2)]))
))
which gives
ID Types A B C D
1 123 A|B|C|D 1 1 1 1
2 234 A|B 1 1 0 0
3 345 D|B 0 1 0 1
4 456 B|D|C 0 1 1 1
5 567 D 0 0 0 1
I do get the wrong result, what am I doing wrong?
df <- data.frame(x=c(1,1,NA),y=c(1,NA,NA),z=c(NA,NA,NA))
df <-mutate(df,result=ifelse(is.na(x),NA,ifelse(any(!is.na(y),!is.na(z)),1,0)))
I get this (data[2,4]==0)
x y z result
1 1 1 NA 1
2 1 NA NA 1
3 NA NA NA NA
Instead of this:
df_wanted <- data.frame(x=c(1,1,NA),y=c(1,NA,NA),z=c(NA,NA,NA), result=c(1,0,NA))
x y z result
1 1 1 NA 1
2 1 NA NA 0
3 NA NA NA NA
We can use | instead of any because any returns a single TRUE/FALSE as output
with(df, any(!is.na(y), !is.na(z)))
#[1] TRUE
and that gets recycled for the entire column and because the first ifelse with 'x' returns already 'NA' for the third row, all the others are returned 1
instead we need to do this for each row and this can be accomplished with |
library(dplyr)
df %>%
mutate(result = ifelse(is.na(x), NA, ifelse(!is.na(y)|!is.na(z), 1, 0)))
# x y z result
#1 1 1 NA 1
#2 1 NA NA 0
#3 NA NA NA NA
Or another option is case_when
df %>%
mutate(result = case_when(is.na(x) ~ NA_integer_,
!is.na(y)| !is.na(z) ~ 1L,
TRUE ~ 0L))
# x y z result
#1 1 1 NA 1
#2 1 NA NA 0
#3 NA NA NA NA
Or with coalesce
df %>%
mutate(result = x * +coalesce(!is.na(y)|!is.na(z)))
# x y z result
#1 1 1 NA 1
#2 1 NA NA 0
#3 NA NA NA NA
You can use case_when and mention each condition explicitly.
library(dplyr)
df %>%
mutate(result = case_when(is.na(x) ~ NA_integer_,
!(is.na(y) & is.na(z)) ~ 1L,
TRUE ~ 0L))
# x y z result
#1 1 1 NA 1
#2 1 NA NA 0
#3 NA NA NA NA
I have a large data set and want to replace many NAs, but not all.
In one group i want to replace all NAs with 0.
In the other group i want to replace all NAs with 0, but only in variables that do not include a certain part of the variable name e.g. 'b'
Here is an example:
group <- c(1,1,2,2,2)
abc <- c(1,NA,NA,NA,NA)
bcd <- c(2,1,NA,NA,NA)
cde <- c(5,NA,NA,1,2)
df <- data.frame(group,abc,bcd,cde)
group abc bcd cde
1 1 1 2 5
2 1 NA 1 NA
3 2 NA NA NA
4 2 NA NA 1
5 2 NA NA 2
This is what i want:
group abc bcd cde
1 1 1 2 5
2 1 0 1 0
3 2 NA NA 0
4 2 NA NA 1
5 2 NA NA 2
This is what i tried:
#set 0 in first group: this works fine
df[is.na(df) & df$group==1] <- 0
#set 0 in second group but only if the variable name includes b: does not work
df[is.na(df) & df$group==2 & !grepl('b',colnames(df))] <- 0
dplyr solutions are welcome as well as basic
For the second group, create the column index with grep and use that to subset the data while assigning
j1 <- !grepl('b',colnames(df))
df[j1][df$group == 2 & is.na(df[j1])] <- 0
df
# group abc bcd cde
#1 1 1 2 5
#2 1 0 1 0
#3 2 NA NA 0
#4 2 NA NA 1
#5 2 NA NA 2
Using dplyr::mutate_at you can also do:
library(dplyr)
vars_mutate_1 <- names(df)[-1]
vars_mutate_2 <- grep(x = names(df)[-1], pattern = '^(?!.*b).*$', perl = TRUE, value = TRUE)
df %>%
mutate_at(.vars = vars_mutate_1, .funs = funs(if_else(group == 1 & is.na(.), 0, .))) %>%
mutate_at(.vars = vars_mutate_2, .funs = funs(if_else(group == 2 & is.na(.), 0, .)))
group abc bcd cde
1 1 1 2 5
2 1 0 1 0
3 2 NA NA 0
4 2 NA NA 1
5 2 NA NA 2
Alternatively, you can use:
library(dplyr)
df2 <- df %>% mutate_at(vars(names(df)[-1]),
function(x) case_when((group==1 & is.na(x) ) ~ 0,
(group==2 & is.na(x) & !grepl("b",deparse(substitute(x)))) ~ 0,
TRUE ~ x))
> df2
group abc bcd cde
1 1 1 2 5
2 1 0 1 0
3 2 NA NA 0
4 2 NA NA 1
5 2 NA NA 2