Confusing .funs argument to mutate_all in dplyr - r

I do not understand the .funs argument to mutate_all() in the dplyr package. In all likelihood the problem lies with me but i would like to understand what I am missing.
I often have to recode multiple variables, like sets of likert items.
The sample code below replicates the problem I often have, and my own solution, but to me my solution does not look like the help documentation. So what am I missing?
#Data
var1<-sample(c('A', 'B', 'C'), 100, replace=T)
var2<-sample(c('A', 'B', 'C'), 100, replace=T)
dat<-data.frame(var1, var2)
library(tidyverse)
library(car)
#As per help documentation
dat %>%
mutate_all(., .funs(Recode(., "'A'=1"))) # This doesn't work, generates an error
#this works but the help documentation does not get you there in anyway, unless I am missing
something.
dat %>%
mutate_all(., funs(Recode(., "'A'=1")))

In the recent version of dplyr, list takes the place of funs i.e. wrapping with list instead of funs
library(dplyr) #v 0.8.3
library(car)
So, either
dat %>%
mutate_all(.funs = ~Recode(., "'A' = 1")) %>%
head(5)
# var1 var2
#1 B C
#2 B C
#3 B C
#4 B 1
#5 C C
Or
dat %>%
mutate_all(~ Recode(., "'A' = 1")) %>%
head(5)
# var1 var2
#1 B C
#2 B C
#3 B C
#4 B 1
#5 C C
Or even without the anonymous function call
dat %>%
mutate_all(Recode, "'A' = 1") %>%
head(5)
# var1 var2
#1 B C
#2 B C
#3 B C
#4 B 1
#5 C C

Related

Returning all rows that don't contain a certain value

Example data frame:
> df <- data.frame(A = c('a', 'b', 'c'), B = c('c','d','e'))
> df
A B
1 a c
2 b d
3 c e
The following returns all rows in which any value is "c"
> df %>% filter_all(any_vars(. == "c"))
A B
1 a c
2 c e
How do I return the inverse of this, all rows in which no value is ever "c"? In this example, that would be row 2 only. Tidyverse solutions preferred, thanks.
EDIT: To be clear, I am asking about exact matching, I don't care if a value contains a "c", just if the value is exactly "c".
Do you have to use dplyr?
df[rowSums(df == 'c') == 0, ]
# A B
#2 b d
Adding OP's comments into answer
This works for me, thank you. My original issue was that any row with a "c" somewhere also had an NA somewhere else, so the adapted solution is
df[rowSums(df == 'c', na.rm = TRUE) == 0, ]
Honestly this is more readable than dplyr syntax. But as I asked for a dplyr solution, I accepted another answer.
dplyr
FYI, filter_all has been superseded by the use of if_any or if_all.
df %>%
filter(if_all(everything(), ~ . != "c"))
# A B
# 1 b d
library(dplyr)
df <- data.frame(A = c('a', 'b', 'c', NA, 'c'), B = c('c','d','e', 'g', NA))
A B
1 a c
2 b d
3 c e
4 <NA> g
5 c <NA>
df %>% filter_all(all_vars(. != "c" | is.na(.)))
A B
1 b d
2 <NA> g

Computing ratio between elements with dplyr in R data frame?

Suppose we have a data frame (df) like this:
a b
1 2
2 4
3 6
If I want to compute the ratio of each element in vectors a and b and assign to variable c, we'd do this:
c <- df$a / df$b
However, I was wondering how the same thing could be done using the dplyr package? I.e. are there any ways that this can be achieved using functions from dplyr?
Maybe you can try the code below
df %>%
mutate(c = do.call("/", .))
or
df %>%
mutate(c = Reduce("/", .))
or
df %>%
mutate(c = a/b)
An option with invoke
library(dplyr)
library(purrr)
df %>%
mutate(c = invoke('/', .))
-output
# a b c
#1 1 2 0.5
#2 2 4 0.5
#3 3 6 0.5
data
df <- data.frame(a = c(1,2,3), b= c(2,4,6))
You can use mutate function from dplyr library:
df <- data.frame(a = c(1,2,3), b= c(2,4,6))
library(dplyr)
df <- df %>%
dplyr::mutate(c = a/b)
Console output:
a b c
1 1 2 0.5
2 2 4 0.5
3 3 6 0.5

How can I fill NA-values in a data frame column based on the values from an other column? [duplicate]

This question already has an answer here:
Replace NA with mode based on ID attribute
(1 answer)
Closed 2 years ago.
I'd like to fill the NA-values in F2-column, based on the the most common F2-value when grouped by F1-column.
F1 F2
1 A C
2 B D
3 A NA
4 A C
5 B NA
Desired outcome:
F1 F2
1 A C
2 B D
3 A C
4 A C
5 B D
Thank you for help
Here is a base R solution. First define a function for Mode (Taken from here) and then apply it to you data frame, i.e.
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
df$F2 <- with(df, ave(F2, F1, FUN = function(i) replace(i, is.na(i), Mode(i))))
df
# F1 F2
#1 A C
#2 B D
#3 A C
#4 A C
#5 B D
Here is one way using dplyr :
library(dplyr)
df %>%
group_by(F1) %>%
mutate(F2 = replace(F2, is.na(F2),
names(sort(table(F2), decreasing = TRUE)[1])))
# F1 F2
# <chr> <chr>
#1 A C
#2 B D
#3 A C
#4 A C
#5 B D
In case of ties, preference is given to lexicographic order.
Try this:
First in df2 I get max count by the variable F1 where F2 is not missing. That will give you the most common F2 value when groups by F1. I join it back onto the original data.frame and use a mutate to fill by the new variable F2_fill and then remove it from this variable from the data.frame.
library(tidyverse)
df <- tribble(
~F1, ~F2,
'A', 'C',
'B' , 'D',
'A' ,NA,
'A', 'C',
'B', NA)
df2 <- df %>%
group_by(F1) %>%
count(F2) %>%
filter(!is.na(F2), n == max(n)) %>%
select(-n) %>%
rename(F2_fill = F2)
df3 <- left_join(df,df2, by="F1") %>%
mutate(F2 = ifelse(is.na(F2), F2_fill,F2)) %>%
select(-F2_fill)
You can use ave with table and which.max and subsetting with is.na when it is a character.
i <- is.na(x$F2)
x$F2[i] <- ave(x$F2, x$F1, FUN=function(y) names(which.max(table(y))))[i]
x
# F1 F2
#1 A C
#2 B D
#3 A C
#4 A C
#5 B D
Data:
x <- data.frame(F1 = c("A", "B", "A", "A", "B")
, F2 = c("C", "D", NA, "C", NA))

Reverse of summarise() function in dplyr [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 6 years ago.
Let's consider the following data
data <- data.frame(V1 = c("A","A","A","B","B","C","C"), V2 = c("B","B","B","C","C","D","D"))
> data
V1 V2
1 A B
2 A B
3 A B
4 B C
5 B C
6 C D
7 C D
Now we aggregate data by both columns and obtain
library(dplyr)
group_by(data, V1, V2) %>% summarise(n())
V1 V2 n()
(fctr) (fctr) (int)
1 A B 3
2 B C 2
3 C D 2
Now we want to turn this data back into original data. Is there any function for this procedure?
We can use base R to do this
data1 <- as.data.frame(data1)
data1[rep(1:nrow(data1), data1[,3]),-3]
This is one of the cases where I would opt for base R. Having said that, there are package solutions for this type of problem, i.e. expandRows (a wrapper for the above) from splitstackshape
library(splitstackshape)
data %>%
group_by(V1, V2) %>%
summarise(n=n()) %>%
expandRows(., "n")
Or if we want to stick to a similar option as in base R within %>%
data %>%
group_by(V1, V2) %>%
summarise(n=n()) %>%
do(data.frame(.[rep(1:nrow(.), .$n),-3]))
# V1 V2
# (fctr) (fctr)
#1 A B
#2 A B
#3 A B
#4 B C
#5 B C
#6 C D
#7 C D
data
data1 <- group_by(data, V1, V2) %>% summarise(n())

separate() in tidyr with NA

I have a question related to separate() in the tidyr package. When there is no NA in a data frame, separate() works. I have been using this function a lot. But, today I had a case in which there were NAs in a data frame. separate() returned an error message. I could be very silly. But, I wonder if tidyr may not be designed for this kind of data cleaning. Or is there any way separate() can work with NAs? Thank you very much for taking your time.
Here is an updated sample based on the comments. Say I want to separate characters in y and create new columns. If I remove the row with NA, separate() will work. But, I do not want to delete the row, what could I do?
x <- c("a-1","b-2","c-3")
y <- c("d-4","e-5", NA)
z <- c("f-6", "g-7", "h-8")
foo <- data.frame(x,y,z, stringsAsFactors = F)
ana <- foo %>%
separate(y, c("part1", "part2"))
# > foo
# x y z
# 1 a-1 d-4 f-6
# 2 b-2 e-5 g-7
# 3 c-3 <NA> h-8
# > ana <- foo %>%
# + separate(y, c("part1", "part2"))
# Error: Values not split into 2 pieces at 3
One way would be:
res <- foo %>%
mutate(y=ifelse(is.na(y), paste0(NA,"-", NA), y)) %>%
separate(y, c('part1', 'part2'))
res[res=='NA'] <- NA
res
# x part1 part2 z
#1 a-1 d 4 f-6
#2 b-2 e 5 g-7
#3 c-3 <NA> <NA> h-8
You can use extra option in separate.
Here's an example from hadley's github issue page
> df <- data.frame(x = c("a", "a b", "a b c", NA))
> df
x
1 a
2 a b
3 a b c
4 <NA>
> df %>% separate(x, c("a", "b"), extra = "merge")
a b
1 a <NA>
2 a b
3 a b c
4 <NA> <NA>
> df %>% separate(x, c("a", "b"), extra = "drop")
a b
1 a <NA>
2 a b
3 a b
4 <NA> <NA>

Resources