I have dataframe df as follows:
df <- data.frame(x = c("A", "A", "B", "B"), y = 1:4)
And I have a function that finds the mean of y grouped by x:
generateVarMean <- function(df, x, y) {
mean.df <- df %>%
select(x, y) %>%
group_by(x) %>%
dplyr::summarise(variable.mean = mean(y, na.rm = TRUE))
colnames(mean.df)[2] <- paste0("y", ".mean")
print(mean.df)
}
However, I want to the first argument of paste0() to reflect the actual function argument (i.e. so that it can be used for different dataframes).
Desired functionality:
df1 <- data.frame(a = c("A", "A", "B", "B"), b = 1:4)
generateVarMean(df1, a, b)
a b.mean
1 A 1.5
2 B 3.5
Any help getting pointed in the right direction very much appreciated.
We can make use of the quosure from the devel version of dplyr (soon to be released 0.6.0)
generateVarMean <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
newName <- paste0(quo_name(y), ".mean")
df %>%
select(UQ(x), UQ(y)) %>%
group_by(UQ(x)) %>%
summarise(UQ(newName) := mean(UQ(y), na.rm = TRUE))
}
generateVarMean(df1, a, b)
# A tibble: 2 × 2
# a b.mean
# <fctr> <dbl>
#1 A 1.5
#2 B 3.5
We get the input arguments as quosure with enquo, convert the quosure to string with quo_name to create 'newName' string. Then, evaluate the quosure inside select/group_by/summarise by unquoting (UQ or !!). Note that in the new version, we can also assign the column names directly and using the assign operator (:=)
No need to add anything to the function. Just replace paste0("y", ".mean") with paste0(deparse(substitute(y)), ".mean")
So now the function and the output will be:
> generateVarMean <- function(df, x, y) {
mean.df <- df %>%
select(x, y) %>%
group_by(x) %>%
dplyr::summarise(variable.mean = mean(y, na.rm = TRUE))
colnames(mean.df)[2] <- paste0(deparse(substitute(y)), ".mean")
print(mean.df)
}
> generateVarMean(df, a, b)
# A tibble: 2 × 2
x b.mean
<fctr> <dbl>
1 A 1.5
2 B 3.5
Related
I have 2 dataframes like the following:
df1
colA
A
B
C
D
df2
one two
x A
y A;B
z A;D;C
p E
q F
I want to filter df2 for entries contained in df1. i.e "two" containing values of colA, so that my output will be
one two
x A
y A;B
z A;D;C
I tried all these options that didn't work
df2filtered = df2 %>% filter(two %in% df1$colA)
df2filtered = df2 %>% filter(two %in% str_detect(df1$colA))
df2filtered = df2 %>% select(two, contains(df1$colA))
str_detect with character works but not when given in df like above. What is the right solution?
Here's one way to obtaning the desired output using map to create an extra column to afterwards apply the filter.
library(tidyverse)
df2 %>%
# Use map to check if any string in df1$colA is found in
# df2$two; then use any to check if any entry is T
mutate(stay = map(two, function(x){
any(str_detect(x,df1$colA))
})) %>%
# Filter
filter(stay == T) %>%
# Remove extra column
select(-c(stay))
# one two
#1 x A
#2 y A;B
#3 z A;D;C
Your data is not "tidy". I'd reshape it into a long format. Then, filtering becomes easy.
Below an approach which makes use of an non-exported function of the eye package in order to split the column into an unknown number of columns. (disclaimer: I am the author of this package. The function was inspired and modified from this answer). Then pivot the result longer and filter by the presence in df1$colA. I'd leave the result in a tidy format, but you can of course melt it back to your rather messy shape.
library(tidyverse)
df1 <- read.table(text = "colA
A
B
C
D", header = TRUE)
df2 <- read.table(text = "one two
x A
y A;B
z A;D;C
p E
q F ", header = TRUE)
#install.packages("eye")
eye:::split_mult(df2, "two", pattern = ";" ) %>%
pivot_longer(cols = starts_with("var"), names_to = "var", values_to = "val") %>%
drop_na(val)%>%
select(-var) %>%
group_by(one) %>%
filter(any(val %in% df1$colA))
#> # A tibble: 6 x 2
#> # Groups: one [3]
#> one val
#> <chr> <chr>
#> 1 x A
#> 2 y A
#> 3 y B
#> 4 z A
#> 5 z D
#> 6 z C
Created on 2021-07-14 by the reprex package (v2.0.0)
because this function might change in the future, here for future reference:
split_mult <- function (x, col, pattern = "_", into = NULL, prefix = "var",
sep = "")
{
cols <- stringr::str_split_fixed(x[[col]], pattern, n = Inf)
cols[which(cols == "")] <- NA_character_
m <- dim(cols)[2]
if (length(into) == m) {
colnames(cols) <- into
}
else {
colnames(cols) <- paste(prefix, 1:m, sep = sep)
}
cbind(cols, x[names(x) != col])
}
Another option using str_detect. You can collapse df1$colA so that str_detect searches for A or B or C or D. e.g. "A|B|C|D".
library(tidyverse)
df2 %>% filter(str_detect(two, paste(df1$colA, collapse = '|')))
#> one two
#> 1 x A
#> 2 y A;B
#> 3 z A;D;C
Lets say i have the following data: x <- 1:2.
My desired output is a data.frame() like the following:
a b
1 2
With base R i would do something along:
df <- data.frame(t(x))
colnames(df) <- c("a", "b")
Question: How would i do this with the pipe operator?
What i tried so far:
library(magrittr)
x %>% data.frame(a = .[1], b = .[2])
After the transpose, convert to tibble with as_tibble and change the column names with set_names
library(dplyr)
library(tibble)
x %>%
t %>%
as_tibble(.name_repair = "unique") %>%
setNames(c("a", "b"))
# A tibble: 1 x 2
# a b
# <int> <int>
#1 1 2
Or another option if we want to use the OP's syntax would be to wrap the code with {}
x %>%
{data.frame(a = .[1], b = .[2])}
I'm trying to put a dplyr pipeline in a function but after reading the vignette multiple times as well as the tidy evaluation (https://tidyeval.tidyverse.org/dplyr.html).
I still can't get it to work...
#Sample data:
dat <- read.table(text = "A ID B
1 X 83
2 X NA
3 X NA
4 Y NA
5 X 2
6 Y 2
12 Y 10
7 Y 18
8 Y 85", header = TRUE)
# What I'm trying to do:
x <- dat %>% filter(!is.na(B)) %>% count('ID') %>% filter(freq>3)
x$ID
# Now in a function:
n_occurences <- function(df, n, column){
# Group by ID and return IDs with number of non-na > n in column
column <- enquo(column)
x <- df %>%
filter(!is.na(!!column)) %>%
count('ID') %>% filter(freq>n)
x$ID
}
# Let's try:
col <- 'B'
n_occurences(dat, n=3, column = col)
There is no error, but the output is wrong. This as something to do with the tidy evaluation, but I just can't get my head around it.
With rlang_0.40, we can do this much easier by using the {{...}} or curly-curly operator
library(rlang)
library(dplyr)
n_occurences <- function(df, n1, column){
df %>%
filter(!is.na({{column}})) %>%
count(ID) %>%
filter(n > n1) %>%
pull(ID)
}
n_occurences(dat, n1 = 3, column = B)
#[1] Y
#Levels: X Y
If we intend to pass a quoted string, convert it to symbol (sym) and then do the evaluation (!!)
n_occurences <- function(df, n1, column){
column <- rlang::sym(column)
df %>%
filter(!is.na(!!column)) %>%
count(ID) %>%
filter(n > n1) %>%
pull(ID)
}
col <- 'B'
n_occurences(dat, n1=3, column = col)
#[1] Y
#Levels: X Y
I have a dataframe that looks like this
df = data.frame(id = 1:10, wt = 71:80, gender = rep(1:2, 5), race = rep(1:2, 5))
I'm trying to write a function that takes on a dataframe as a first argument together with any number of arguments that represent column names in that dataframe and use these column names to perform operations on the dataframe. My function would look like this:
library(dplyr)
myFunction <- function(df, ...){
columns <- list(...)
for (i in 1:length(columns)){
var <- enquo(columns[[i]])
df <- df %>% group_by(!!var)
}
df2 = summarise(df, mean = mean(wt))
return(df2)
}
I call the function as the following
myFunction(df, race, gender)
However, I get the following error message:
Error in myFunction(df, race, gender) : object 'race' not found
We can convert the elements in ... to quosures and then do the evaluation (!!!)
myFunction <- function(dat, ...){
columns <- quos(...) # convert to quosures
dat %>%
group_by(!!! columns) %>% # evaluate
summarise(mean = mean(wt))
}
myFunction(df, race, gender)
# A tibble: 2 x 3
# Groups: race [?]
# race gender mean
# <int> <int> <dbl>
#1 1 1 75
#2 2 2 76
myFunction(df, race)
# A tibble: 2 x 2
# race mean
# <int> <dbl>
#1 1 75
#2 2 76
NOTE: In the OP's example, 'race' and 'gender' are the same
If it change it, will see the difference
df <- data.frame(id = 1:10, wt = 71:80, gender = rep(1:2, 5),
race = rep(1:2, each = 5))
myFunction(df, race, gender)
myFunction(df, race)
myFunction(df, gender)
If we decide to pass the arguments as quoted strings, then we can make use of group_by_at
myFunction2 <- function(df, ...) {
columns <- c(...)
df %>%
group_by_at(columns) %>%
summarise(mean= mean(wt))
}
myFunction2(df, "race", "gender")
I am trying to write a function that will (in part) rename a variable by combining its source dataframe and existing variable name. In essence, I want:
df1 <- data.frame(a = 1, b = 2)
to become:
df1 %>%
rename(df1_a = a)
# df1_a b
#1 1 2
But I want to do this programatically, something along the lines of:
fun <- function(df, var) {
outdf <- rename_(df, paste(df, var, sep = "_") = var)
return(outdf)
}
This admittedly naive approach obviously doesn't work, but I haven't been able to figure it out. I'm sure the answer is somewhere in the nse vignette (https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html), but that doesn't seem to address constructing variable names.
Not sure if this is the proper dplyr-esque way, but it'll get you going.
fun <- function(df, var) {
x <- deparse(substitute(df))
y <- deparse(substitute(var))
rename_(df, .dots = with(df, setNames(as.list(y), paste(x, y, sep = "_"))))
}
fun(df1, a)
# df1_a b
# 1 1 2
fun(df1, b)
# a df1_b
# 1 1 2
lazyeval isn't really needed here because the environment of both inputs is known. That being said:
library(lazyeval)
library(dplyr)
library(magrittr)
fun = function(df, var) {
df_ = lazy(df)
var_ = lazy(var)
fun_(df_, var_)
}
fun_ = function(df_, var_) {
new_var_string =
paste(df_ %>% as.character %>% extract(1),
var_ %>% as.character %>% extract(1),
sep = "_")
dots = list(var_) %>% setNames(new_var_string)
df_ %>%
lazy_eval %>%
rename_(.dots = dots)
}
fun(df1, a)