Programmatically setting a column name in dplyr::top_n - r

I would like to programmatically set a column name for the dplyr::top_n function.
getSubset <- function(df, t, f) {
df %>%
top_n(t, wt = eval(as.name(f), envir = df))
}
data.frame(x = 1:20, y = 20:1) %>%
getSubset(10, "x")
And it tells me that Error: object 'f' not found. I tried to play with lazyeval package but somehow I keep misunderstanding the concept. Could somebody push me in the right direction? Thanks!

Your problem is that top_n uses non-standard evaluation. It's trying to evaluate the expression eval(as.name(f), envir = df)) in the context of the data frame df, and f doesn't exist in that environment.
One work around would be to temporarily add the desired value of wt to the data frame.
getSubset <- function(df, t, f) {
df %>%
mutate(.wt = eval(as.name(f), envir = df)) %>%
top_n(t, wt = .wt) %>%
select(-.wt)
}
data.frame(x = 1:20, y = 20:1) %>%
getSubset(10, "x")
Another approach would be to use interp from the lazyeval package.
getSubset <- function(df, t, f) {
call <- quote(df %>%
top_n(t, wt = .wt))
call <- interp(call, .wt = eval(as.name(f), envir = df))
eval(call)
}
data.frame(x = 1:20, y = 20:1) %>%
getSubset(10, "x")

Related

Recursive magrittr piping / loop in gt package R

I'm trying to apply the data_color() function from the gt package to several columns in my data frame, but each with their own color palette domain. So far, what I have is:
df <- data.frame(Var1 = rnorm(30),
Var2 = rnorm(30),
Var3 = rnorm(30),
Var4 = rnorm(30),
Var5 = rnorm(30),
Var6 = rnorm(30))
mypals <- list()
for (i in 2:6){
mypals[[i]] <- scales::col_bin(colpal,
domain = c(min(df[,i]), max(df[,i])))
}
df %>%
gt() %>%
data_color(columns = 2, colors = mypals[[2]]) %>%
data_color(columns = 3, colors = mypals[[3]]) %>%
data_color(columns = 4, colors = mypals[[4]]) %>%
data_color(columns = 5, colors = mypals[[5]]) %>%
data_color(columns = 6, colors = mypals[[6]])
Is there a way to do a "recursive" piping, something similar to this perhaps?
df %>%
gt() %>% seq(2:6) %>% (function(x){
data_color(columns = x, colors = mypals[[x]])
}
)
Thanks in advance for all your suggestions.
I'm new to the gt package, so forgive me if there's an easier way to do this.
I can' test this answer throughy, because I cant install this gt package, but I believe you are looking for the accumulate or reduce functions from the purrr package.
library(purrr)
my_data_color <- \(x, y, z) data_color(x, columns = y, colors = z[[y]])
reduce2(df %>% gt(),
1:6,
~ my_data_color(x = .x,
y = .y,
z = mypals))
From the man page:
reduce() is an operation that combines the elements of a vector into a single value. The combination is driven by .f, a binary function that takes two values and returns a single value: reducing f over 1:3 computes the value f(f(1, 2), 3).
One approach would be generate your statement and use eval(parse(text=<stment>)), as below:
eval(parse(text=paste(
"df %>% gt() %>%",
paste0("data_color(columns=",2:6,",color='",mypals,"')", collapse=" %>% ")
)))

Can you pipe data into a pairwise.t.test?

I'm wondering if the following code can be simplified to allow the data to be piped directly from the summarise command to the pairwise.t.test, without creating the intermediary object?
data_for_PTT <- data %>%
group_by(subj, TT) %>%
summarise(meanRT = mean(RT))
pairwise.t.test(x = data_for_PTT$meanRT, g = data_for_PTT$TT, paired = TRUE)
I tried x = .$meanRT but it didn't like it, returning:
Error in match.arg(p.adjust.method) :
'arg' must be NULL or a character vector
You can use curly braces:
data_for_PTT <- data %>%
group_by(subj, TT) %>%
summarise(meanRT = mean(RT)) %>%
{pairwise.t.test(x = .$meanRT, g = .$TT, paired = TRUE)}
Reproducible:
df <- data.frame(X1 = runif(1000), X2 = runif(1000), subj = rep(c("A", "B")))
df %>%
{pairwise.t.test(.$X1, .$subj, paired = TRUE)}

Deploying arrange(desc(.)) on each variable passed previously via enquos

Background
Using rlang I've a simple summary function for dplyr that counts a number of missing observations within a variable per provided groups. I would like to return the results in a descending order of grouping variables.
Sample data
library("tidyverse")
set.seed(123)
test_data <- tibble(dates = seq.Date(
from = as.Date.character(x = "01-01-2000", format = "%d-%m-%Y"),
to = as.Date.character(x = "31-12-2010", format = "%d-%m-%Y"),
by = "day"
)) %>%
transmute(
t_year = lubridate::year(dates),
t_mnth = lubridate::month(dates),
t_day = lubridate::day(dates),
tst_var = if_else(rnorm(n()) > .8, NA_real_, rnorm(n()))
)
Summary function
Working version
quick_smry <- function(df, x, ...) {
group_by_vars <- enquos(...)
check_var <- enquo(x)
df %>%
group_by(!!!group_by_vars) %>%
summarise(num_missing = sum(is.na(!!check_var)))
}
Desired results
test_data %>%
group_by(t_year, t_mnth) %>%
summarise(num_missing = sum(is.na(tst_var))) %>%
arrange(desc(t_year), desc(t_mnth))
Problem
Implementing arrange(desc(x)) call so it can handle each of the variables passed initially via enquos. I.e. if there are 5 grouping variables passed via in enquos this should be equivalent of arrange(desc(var1)) .... arrange(desc(var5)).
Attempt
Naturally, this doesn't work:
quick_smry <- function(df, x, ...) {
group_by_vars <- enquos(...)
check_var <- enquo(x)
df %>%
group_by(!!!group_by_vars) %>%
summarise(num_missing = sum(is.na(!!check_var))) %>%
# Desc call should be created for each of the group variables
arrange(desc(!!!group_by_vars))
}
You can use arrange_at like this:
quick_smry <- function(df, x, ...) {
group_by_vars <- enquos(...)
check_var <- enquo(x)
df %>%
group_by(!!!group_by_vars) %>%
summarise(num_missing = sum(is.na(!!check_var))) %>%
arrange_at(group_by_vars, desc)
}
quick_smry(test_data, tst_var, t_year, t_mnth)

Can't use !!arg in dplyr for mutate call

I can use !! to filter by a user-given variable but not to modify that same variable. The following function throws an error when created, but it works just fine if I delete the mutate call.
avg_dims <- function(x, y) {
y <- enquo(y)
x %>%
filter(!!y != "TOTAL") %>%
mutate(!!y = "MEAN") %>%
group_by(var1, var2)
}
The naming of the column on the lhs of assignment goes along with the assignment operator (:=) instead of the = operator. Also, the names should be either string or symbol. So, we can convert the quosure ('y' from enquo) to string (quo_name) and then do the evaluation (!!)
avg_dims <- function(x, y) {
y <- enquo(y)
y1 <- rlang::quo_name(y)
x %>%
filter(!!y != "TOTAL") %>%
mutate(!!y1 := "MEAN") %>%
group_by(var1, var2)
}
avg_dims(df1, varN)
data
set.seed(24)
df1 <- data.frame(var1 = rep(LETTERS[1:3], each = 4),
var2 = rep(letters[1:2], each = 6),
varN = sample(c("TOTAL", "hello", 'bc'), 12, replace = TRUE),
stringsAsFactors = FALSE)

How to apply a "complicated" user defined function on each element of a tibble

I have searched high and low for the answer to this (seemingly simple) problem, but came up empty so I hope someone can help me or point me in the right direction.
I have a fairly complicated submodel that I want to apply to a dataset, but if I just use use mutate I get an error Variables must be length 1 or 21. adding rowwise() doesnt seem to impact it.
Let me use the following silly illustration of the problem:
myData <- tibble(x=10:20, y=c("a", "b","a", "b","a", "b","a", "b","a", "b","a"))
staticData <- tibble(x=0:100, y=c("a"),f=x/100) %>% union (tibble(x=0:100, y=c("b"),f=x/1000))
ComplicatedFunction <- function(mystaticData, myx, myy) {
#make the base table
myBaseTable <- tibble(
y = myy,
x = c(myx:(myx + 20))
)
#add f rates
myBaseTable <- left_join(myBaseTable,mystaticData)
#add stuff
myBaseTable <- myBaseTable %>%
mutate(z = 1 - (f * 0.8)) %>%
mutate(zCumulative = cumprod(z))
#Calculate the thing
myCalculatedThing <- sum(myBaseTable$zCumulative)
return(myCalculatedThing)
}
#This is what I want to do
myData %>% mutate(newcol = ComplicatedFunction(mystaticData = staticData,
myx = x,
myy = y))
#this works
ComplicatedFunction(mystaticData = staticData,
myx = 19,
myy = "b")
ComplicatedFunction(mystaticData = staticData,
myx = 20,
myy = "a")
#This works (but would be silly as I want the function to be evaluated for each line)
myData %>% mutate(newcol = ComplicatedFunction(mystaticData = staticData,
myx = 15,
myy = "a"))
#This no longer works, but I dont understand what I am doing wrong
myData %>% mutate(newcol = ComplicatedFunction(mystaticData = staticData,
myx = x,
myy = "a"))
#I tried rowwise(), but this doesnt seem to work either
myData %>% rowwise() %>% mutate(newcol = ComplicatedFunction(mystaticData = staticData,
myx = x,
myy = y))
I hope someone can explain to me what I am doing wrong here.
Many thanks in advance!
Sylvain
You can do it by creating a new function using partial:
library(purrr)
newCF <- partial(ComplicatedFunction, mystaticData = staticData)
myData %>% rowwise() %>% mutate(newcol = newCF(myx = x,
myy = y))

Resources