Use dplyr to change all values above threshold to NA - r

I have a data frame of numbers and I want to change all values over 8 to NA. I know there are other ways to do this, but I would like to accomplish this using dplyr so I can use a pipe with other code I have.
df <- data.frame(c(1:9), c(2:10))
This is what I've tried so far:
library(dplyr)
df %>%
mutate(across(everything(), function(x) ifelse(x>8, NA, x)))
df %>%
mutate(across(everything(), function(x) na_if(x >8)))

We can assign the output to the original object to make those changes as the %>% will not do the output printed on the console.
df <- df %>%
mutate(across(everything(), ~ ifelse(. > 8, NA, .)))
Or another option is %<>% operator from magrittr
library(magrittr)
df %<>%
mutate(across(everything(), ~ ifelse(. > 8, NA, .)))

Related

replace_na with tidyselect?

Suppose I have a data frame with a bunch of columns where I want to do the same NA replacement:
dd <- data.frame(x = c(NA, LETTERS[1:4]), a = rep(NA_real_, 5), b = c(1:4, NA))
For example, in the data frame above I'd like to do something like replace_na(dd, where(is.numeric), 0) to replace the NA values in columns a and b.
I could do
num_cols <- purrr::map_lgl(dd, is.numeric)
r <- as.list(setNames(rep(0, sum(num_cols)), names(dd)[num_cols]))
replace_na(dd, r)
but I'm looking for something tidier/more idiomatic/nicer ...
If we need to dynamically do the replacement with where(is.numeric), can wrap it in across
library(dplyr)
library(tidyr)
dd %>%
mutate(across(where(is.numeric), replace_na, 0))
Or we can specify the replace as a list of key/value pairs
replace_na(dd, list(a = 0, b = 0))
which can be programmatically created by selecting the columns that are numeric, get the names, convert to a key/value pair with deframe (or use summarise with 0) and then use replace_na
library(tibble)
dd %>%
select(where(is.numeric)) %>%
summarise(across(everything(), ~ 0)) %>%
replace_na(dd, .)

How to use mutate rowwise with complex row operation?

How can I use mutate to achieve the below?
bd_diag_date <- df %>%
apply(1, function(dates) last(na.omit(dates))) %>%
as.data.frame() %>%
`colnames<-`("diag_date")
I tried this below but didn't work. I can't find out why and it says Error: Column 'diagnosis_date' is of unsupported type symbol. Should I assume mutate takes any function operation that can apply to a vector? If not, then what kind of operation does it accept?
bd_diag_date <- df %>%
rowwise() %>%
{mutate(., diag_date=last(na.omit(all_vars(.))))}
I also have a more general questions. That is how can I debug this? Every time I encounter this problem I have to google stack exchange but I feel like this isn't the right way to improve my dplyr skill.
We can use pmap
library(dplyr)
library(purrr)
df %>%
mutate(diag_date = pmap(., ~ last(na.omit(c(...)))))
If the columns are numeric, we can use pmap_dbl, simply using pmap returns a list column
df %>%
mutate(diag_date = pmap_dbl(., ~ last(na.omit(c(...)))))
# col1 col2 col3 diag_date
#1 1 NA 2 2
#2 NA 2 NA 2
#3 3 4 NA 4
If we need to return only a single column, use transmute
df %>%
transmute(diag_date = pmap_dbl(., ~ last(na.omit(c(...)))))
Or with group_split and map
df %>%
group_split(grp = row_number(), keep = FALSE) %>%
map_dfr(~ .x %>%
transmute(diag_date = last(na.omit(unlist(.)))))
Or using base R with max.col
df$diag_date <- df[cbind(seq_len(nrow(df)), max.col(!is.na(df), 'last'))]
data
df <- data.frame(col1 = c(1, NA, 3), col2 = c(NA, 2, 4), col3 = c(2, NA, NA))

Add multiple columns with mutate using column-based conditions, without using explicit column name + POSIX

I have a dataframe of data: 1 column is POSIX, the rest is data.
I need to remove selectively some data from a group of columns and add these "new" columns to the original dataframe.
I can "easily" do it in base R (I am an old-style user). I'd like to do it more compactly with mutate_at or with other function... although I am having several issues.
A solution homemade with base R could be
df <- data.frame("date" = seq.POSIXt(as.POSIXct(format(Sys.time(),"%F %T"),tz="UTC"),length.out=20,by="min"), "a.1" = rnorm(20,0,3), "a.2" = rnorm(20,1,2), "b.1"= rnorm(20,1,4), "b.2"= rnorm(20,3,4))
df1 <- lapply(df[,grep("^a",names(df))], function(x) replace(x, which(x > 0 & x < 0.2), NA))
df1 <- data.frame(matrix(unlist(df1), nrow = nrow(df), byrow = F)) ## convert to data.frame
names(df1) <- grep("^a",names(df),value=T) ## rename columns
df1 <- cbind.data.frame("date"=df$date, df1) ## add date
Can anyone help me in setting up something working with dplyr + transmute?
So far I come up with something like:
df %>%
select(starts_with("a.")) %>%
transmute(
case_when(
.>0.2 ~ NA,
)
) %>%
cbind.data.frame(df)
But I am quite stuck, since I can't combine transmute with case_when: all examples that I found use explicitly the column names in case_when, but I can't, since I won't know the names of the column in advance. I will only know the initial of the columns that I need to transmute.
Thanks,
Alex
We can use transmute_at if the intention is to return only those columns specified in the vars
library(dplyr)
df %>%
transmute_at(vars(starts_with('a')), ~ case_when(. > 0.2~ NA_real_, TRUE~ .)) %>%
bind_cols(df %>% select(date), .)
If we need all the columns to return, but only change the columns of interest in vars, then we need mutate_at instead of transmute_at
df %>%
mutate_at(vars(starts_with('a')), ~ case_when(. > 0.2~ NA_real_, TRUE~ .)) %>%
select(date, starts_with('a')) # only need if we are selecting a subset of columns

How do I combine mutate_all and ifelse

I would like to iterate over all columns of a data.frame with mutate_all() and then selectively change values using ifelse().
testdf <- data.frame("a"=c(1,2,3), "b"=c(4,5,6), "c"=c(7,8,9))
mutate_all(testdf, ifelse(.>9,10,.))
But this does not work. I always get "object '.' not found". How do I refer to the individual values passed through the mutate_all() function? I thought the '.' worked that way? This works:
mutate_all(testdf, funs(.*2))
Try any of these:
testdf %>% mutate_all(function(x) ifelse(x>9,10,x))
testdf %>% mutate_all(funs(ifelse(.>9,10,.)))
testdf %>% mutate_all(testdf, ~ifelse(.>9,10,.))
testdf %>% mutate_all(~ pmin(., 10))
testdf %>% mutate_all(pmin, 10)
testdf %>% mutate_all(~ replace(., . > 9, 10))
testdf %>% replace(. > 9, 10)
Last two are per Ronak Shah comment below.
Update
Since this question was asked dplyr 1.0.0 has come out and introduced a new across function which is used with mutate and is now preferred over the mutate_* functions.
testdf %>% mutate(across(, ~ pmin(., 10)))

Replace value in dataframe non-conditionally with dplyr

What is the dplyr function (if any) for df[df == 2] <- 3?
(i.e. replace all values of 2 in the dataframe df by 3)
With dplyr I could do that as:
df %>% mutate_all(funs(ifelse(.==2, 3, .)))
Is there a function such as recode_all(df, old_value=2, new_value=3)?
We can also use replace
library(dplyr)
df %>%
replace(.< 22, "smaller_22")
thats a pretty good one in my opinion:
df1 <- mtcars[1:4,1:4]
df1 %>% `[<-`(., . < 22, value = "smaller_22")
so in your special case:
df %>% `[<-`(., . == 2, value = 3)
This could work:
df %>% mutate_all(funs(case_when(.==2 ~ 3, TRUE ~ as.numeric(.))))
(not sure why it wants me to change it to numeric, but just putting in . at the end gave an error...)

Resources