How do I combine mutate_all and ifelse - r

I would like to iterate over all columns of a data.frame with mutate_all() and then selectively change values using ifelse().
testdf <- data.frame("a"=c(1,2,3), "b"=c(4,5,6), "c"=c(7,8,9))
mutate_all(testdf, ifelse(.>9,10,.))
But this does not work. I always get "object '.' not found". How do I refer to the individual values passed through the mutate_all() function? I thought the '.' worked that way? This works:
mutate_all(testdf, funs(.*2))

Try any of these:
testdf %>% mutate_all(function(x) ifelse(x>9,10,x))
testdf %>% mutate_all(funs(ifelse(.>9,10,.)))
testdf %>% mutate_all(testdf, ~ifelse(.>9,10,.))
testdf %>% mutate_all(~ pmin(., 10))
testdf %>% mutate_all(pmin, 10)
testdf %>% mutate_all(~ replace(., . > 9, 10))
testdf %>% replace(. > 9, 10)
Last two are per Ronak Shah comment below.
Update
Since this question was asked dplyr 1.0.0 has come out and introduced a new across function which is used with mutate and is now preferred over the mutate_* functions.
testdf %>% mutate(across(, ~ pmin(., 10)))

Related

Use dplyr to change all values above threshold to NA

I have a data frame of numbers and I want to change all values over 8 to NA. I know there are other ways to do this, but I would like to accomplish this using dplyr so I can use a pipe with other code I have.
df <- data.frame(c(1:9), c(2:10))
This is what I've tried so far:
library(dplyr)
df %>%
mutate(across(everything(), function(x) ifelse(x>8, NA, x)))
df %>%
mutate(across(everything(), function(x) na_if(x >8)))
We can assign the output to the original object to make those changes as the %>% will not do the output printed on the console.
df <- df %>%
mutate(across(everything(), ~ ifelse(. > 8, NA, .)))
Or another option is %<>% operator from magrittr
library(magrittr)
df %<>%
mutate(across(everything(), ~ ifelse(. > 8, NA, .)))

Replace value in dataframe non-conditionally with dplyr

What is the dplyr function (if any) for df[df == 2] <- 3?
(i.e. replace all values of 2 in the dataframe df by 3)
With dplyr I could do that as:
df %>% mutate_all(funs(ifelse(.==2, 3, .)))
Is there a function such as recode_all(df, old_value=2, new_value=3)?
We can also use replace
library(dplyr)
df %>%
replace(.< 22, "smaller_22")
thats a pretty good one in my opinion:
df1 <- mtcars[1:4,1:4]
df1 %>% `[<-`(., . < 22, value = "smaller_22")
so in your special case:
df %>% `[<-`(., . == 2, value = 3)
This could work:
df %>% mutate_all(funs(case_when(.==2 ~ 3, TRUE ~ as.numeric(.))))
(not sure why it wants me to change it to numeric, but just putting in . at the end gave an error...)

group_by by a vector of characters using tidy evaluation semantics

I used to do it, using group_by_
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_(.dots = group_by) %>% summarise(gear = mean(gear))
but now group_by_ is deprecated. I don't know how to do it using the tidy evaluation framework.
New answer
With dplyr 1.0, you can now use selection helpers like all_of() inside across():
df |>
group_by(
across(all_of(my_vars))
)
Old answer
Transform the character vector into a list of symbols and splice it in
df %>% group_by(!!!syms(group_by))
There is group_by_at variant of group_by:
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_at(group_by) %>% summarise(gear = mean(gear))
Above it's simplified version of generalized:
mtcars %>% group_by_at(vars(one_of(group_by))) %>% summarise(gear = mean(gear))
inside vars you could use any dplyr way of select variables:
mtcars %>%
group_by_at(vars(
one_of(group_by) # columns from predefined set
,starts_with("a") # add ones started with a
,-hp # but omit that one
,vs # this should be always include
,contains("_gr_") # and ones with string _gr_
)) %>%
summarise(gear = mean(gear))

Correct syntax for mutate_if

I would like to replace NA values with zeros via mutate_if in dplyr. The syntax below:
set.seed(1)
mtcars[sample(1:dim(mtcars)[1], 5),
sample(1:dim(mtcars)[2], 5)] <- NA
require(dplyr)
mtcars %>%
mutate_if(is.na,0)
mtcars %>%
mutate_if(is.na, funs(. = 0))
Returns error:
Error in vapply(tbl, p, logical(1), ...) : values must be length 1,
but FUN(X[[1]]) result is length 32
What's the correct syntax for this operation?
I learned this trick from the purrr tutorial, and it also works in dplyr.
There are two ways to solve this problem:
First, define custom functions outside the pipe, and use it in mutate_if():
any_column_NA <- function(x){
any(is.na(x))
}
replace_NA_0 <- function(x){
if_else(is.na(x),0,x)
}
mtcars %>% mutate_if(any_column_NA,replace_NA_0)
Second, use the combination of ~,. or .x.( .x can be replaced with ., but not any other character or symbol):
mtcars %>% mutate_if(~ any(is.na(.x)),~ if_else(is.na(.x),0,.x))
#This also works
mtcars %>% mutate_if(~ any(is.na(.)),~ if_else(is.na(.),0,.))
In your case, you can also use mutate_all():
mtcars %>% mutate_all(~ if_else(is.na(.x),0,.x))
Using ~, we can define an anonymous function, while .x or . stands for the variable. In mutate_if() case, . or .x is each column.
The "if" in mutate_if refers to choosing columns, not rows. Eg mutate_if(data, is.numeric, ...) means to carry out a transformation on all numeric columns in your dataset.
If you want to replace all NAs with zeros in numeric columns:
data %>% mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))
mtcars %>% mutate_if(is.numeric, replace_na, 0)
or more recent syntax
mtcars %>% mutate(across(where(is.numeric),
replace_na, 0))
We can use set from data.table
library(data.table)
setDT(mtcars)
for(j in seq_along(mtcars)){
set(mtcars, i= which(is.na(mtcars[[j]])), j = j, value = 0)
}
I always struggle with replace_na function of dplyr
replace(is.na(.),0)
this works for me for what you are trying to do.

dplyr change many data types

I have a data.frame:
dat <- data.frame(fac1 = c(1, 2),
fac2 = c(4, 5),
fac3 = c(7, 8),
dbl1 = c('1', '2'),
dbl2 = c('4', '5'),
dbl3 = c('6', '7')
)
To change data types I can use something like
l1 <- c("fac1", "fac2", "fac3")
l2 <- c("dbl1", "dbl2", "dbl3")
dat[, l1] <- lapply(dat[, l1], factor)
dat[, l2] <- lapply(dat[, l2], as.numeric)
with dplyr
dat <- dat %>% mutate(
fac1 = factor(fac1), fac2 = factor(fac2), fac3 = factor(fac3),
dbl1 = as.numeric(dbl1), dbl2 = as.numeric(dbl2), dbl3 = as.numeric(dbl3)
)
is there a more elegant (shorter) way in dplyr?
thx
Christof
Edit (as of 2021-03)
As also pointed out in Eric's answer, mutate_[at|if|all] has been superseded by a combination of mutate() and across(). For reference, I will add the respective pendants to the examples in the original answer (see below):
# convert all factor to character
dat %>% mutate(across(where(is.factor), as.character))
# apply function (change encoding) to all character columns
dat %>% mutate(across(where(is.character),
function(x){iconv(x, to = "ASCII//TRANSLIT")}))
# subsitute all NA in numeric columns
dat %>% mutate(across(where(is.numeric), function(x) tidyr::replace_na(x, 0)))
Original answer
Since Nick's answer is deprecated by now and Rafael's comment is really useful, I want to add this as an Answer. If you want to change all factor columns to character use mutate_if:
dat %>% mutate_if(is.factor, as.character)
Also other functions are allowed. I for instance used iconv to change the encoding of all character columns:
dat %>% mutate_if(is.character, function(x){iconv(x, to = "ASCII//TRANSLIT")})
or to substitute all NA by 0 in numeric columns:
dat %>% mutate_if(is.numeric, function(x){ifelse(is.na(x), 0, x)})
You can use the standard evaluation version of mutate_each (which is mutate_each_) to change the column classes:
dat %>% mutate_each_(funs(factor), l1) %>% mutate_each_(funs(as.numeric), l2)
EDIT - The syntax of this answer has been deprecated, loki's updated answer is more appropriate.
ORIGINAL-
From the bottom of the ?mutate_each (at least in dplyr 0.5) it looks like that function, as in #docendo discimus's answer, will be deprecated and replaced with more flexible alternatives mutate_if, mutate_all, and mutate_at. The one most similar to what #hadley mentions in his comment is probably using mutate_at. Note the order of the arguments is reversed, compared to mutate_each, and vars() uses select() like semantics, which I interpret to mean the ?select_helpers functions.
dat %>% mutate_at(vars(starts_with("fac")),funs(factor)) %>%
mutate_at(vars(starts_with("dbl")),funs(as.numeric))
But mutate_at can take column numbers instead of a vars() argument, and after reading through this page, and looking at the alternatives, I ended up using mutate_at but with grep to capture many different kinds of column names at once (unless you always have such obvious column names!)
dat %>% mutate_at(grep("^(fac|fctr|fckr)",colnames(.)),funs(factor)) %>%
mutate_at(grep("^(dbl|num|qty)",colnames(.)),funs(as.numeric))
I was pretty excited about figuring out mutate_at + grep, because now one line can work on lots of columns.
EDIT - now I see matches() in among the select_helpers, which handles regex, so now I like this.
dat %>% mutate_at(vars(matches("fac|fctr|fckr")),funs(factor)) %>%
mutate_at(vars(matches("dbl|num|qty")),funs(as.numeric))
Another generally-related comment - if you have all your date columns with matchable names, and consistent formats, this is powerful. In my case, this turns all my YYYYMMDD columns, which were read as numbers, into dates.
mutate_at(vars(matches("_DT$")),funs(as.Date(as.character(.),format="%Y%m%d")))
Dplyr across function has superseded _if, _at, and _all. See vignette("colwise").
dat %>%
mutate(across(all_of(l1), as.factor),
across(all_of(l2), as.numeric))
It's a one-liner with mutate_at:
dat %>% mutate_at("l1", factor) %>% mutate_at("l2", as.numeric)
A more general way of achieving column type transformation is as follows:
If you want to transform all your factor columns to character columns, e.g., this can be done using one pipe:
df %>% mutate_each_( funs(as.character(.)), names( .[,sapply(., is.factor)] ))
Or mayby even more simple with convert from hablar:
library(hablar)
dat %>%
convert(fct(fac1, fac2, fac3),
num(dbl1, dbl2, dbl3))
or combines with tidyselect:
dat %>%
convert(fct(contains("fac")),
num(contains("dbl")))
For future readers, if you are ok with dplyr guessing the column types, you can convert the col types of an entire df as if you were originally reading it in with readr and col_guess() with
library(tidyverse)
df %>% type_convert()
Try this
df[,1:11] <- sapply(df[,1:11], as.character)

Resources