Anonymous function in R using sparklyr spark_apply - r

I am trying to use the spark_apply() function from library(sparklyr) I am using the spark_apply() function because the sparklyr package does not support using subsets. I am a bit lost about where I need to include the function(e) within the following dplyr syntax.
Here is the original syntax I am trying to adapt with an anonymous function (I'm not 100% this is the term)
match_cat3 <- match_cat2 %>%
group_by(VarE, VarF) %>%
mutate(Var_G = if(any(Var_C ==1)) ((VarG - VarG[Var_C ==
1])/(Var_G + Var_G[Var_C == 1])/2) else NA)
Here is my attempt at using the spark_apply() function with the mutate equation from above. I would love some help with how to use the function(e) and where the e goes within the syntax. I don't have any experience using a function within another function like this.
match_cat3 <- spark_apply(
function(e)
match_cat2 %>%
group_by(e$VarE, e$VarF) %>%
mutate(e$Var_G = if(any(e$Var_C ==1)) ((e$VarG -
e$VarG[e$Var_C == 1])/(e$Var_G + e$Var_G[e$Var_C == 1])/2) else NA, e)
)
```
This gives me an out of bounds error.
I was basing the syntax off of the following block from the spark_apply() documentation.
trees_tbl %>%
spark_apply(
function(e) data.frame(2.54 * e$Girth, e),
names = c("Girth(cm)", colnames(trees)))
Thanks!

You seem to be having trouble writing a sparklyr::spark_apply() function. The template that might be more useful for you starts with your Spark DataFrame.
##### data_sf is a Spark DataFrame that will be sent to all workers for R
data_sf <- sparklyr::copy_to(sc, iris, overwrite = TRUE)
data2_sf <- sparklyr::spark_apply(
x = data_sf,
f = function(x) { ##### data_sf will be the argument passed to this x parameter
x$Petal_Length <- x$Petal_Length + 10 ##### data_sf will now be converted to an R object used here (Spark doesn't like `Petal.Length` so automatically changes column names)
return(x)
})
In your case:
you're missing the x argument, the first in sparklyr::spark_apply()
you're bringing in external stuff (match_cat2) through the e argument of your anonymous function but improperly putting it inside the definition of the function as well
you're missing brackets around your multiline expression and so you aren't defining a function
you're trying to use dplyr (and magrittr) with wrong syntax--you can refer to variables like group_by(VarE) not group_by(e$VarE)
Functions are defined as function(data, context) {} where you can provide arbitrary code within the {}. Chapter 11.7 Functions
you're trying to do some conditional stuff in your if else (you could also use the ifelse() function here) but I'm not sure what your intent is
##### Rewritten, maybe helpful?
match_cat3 <- spark_apply(
x = match_cat2, ##### the Spark DataFrame you give to spark_apply()
function(e) { ##### the opening bracket
e %>% ##### the function's argument, NOT `match_cat2 %>%`
group_by(VarE, VarF) %>% ##### remove `e$`
mutate(Var_G = something_good) ##### not sure of your intent here
})

Related

R-Shiny Define function with inputs from ui.R

I use the following function to perform a weighted t-test on a data set.
pvfct <- function(var, weights) {
req(input$groupb)
req(input$sex)
req(input$age)
req(input$education)
if(is.null(input$groupa) == FALSE & is.null(input$groupb) == FALSE & is.null(input$sex) == FALSE & is.null(input$age) == FALSE & is.null(input$education) == FALSE) {
data <- df()
data1 <- data %>%
select(var, group1, weightrake) %>%
filter(group1 == 1)
data2 <- data %>%
select(var, group1, weightrake) %>%
filter(group1 == 2)
result <- wtd.t.test(data1[[var]], data2[[var]], data1[[weights]], data2[[weights]], samedata = FALSE)
result <- as.numeric(result$coefficients[3])
result <- round(result, 2)
result
}
else {}
}
result <- pvfct("Image_Vertrauen_ALLBRANDS_top2", "weightrake")
The function works perfectly fine as long as I define it inside Server.R. But what I want is to define all my functions in the global scope. I guess it has to do something to do with the inputs, since these are reactive?! Can anyone help me?
Why is this technically not working?
It is necessary to define all reactive expressions as part of server part of the code. Global scope can only contain static elements like library calls, data manipulation which once performed remain as is even if input changes. Global scope does not reexecute every time the widget input changes, it is only the server code which changes.
Since, your data filtering is dependent on an input condition, it will have to go inside server to work.
To understand how reactivity in shiny works, I find following articles pretty helpful
https://shiny.rstudio.com/articles/reactivity-overview.html
https://shiny.rstudio.com/articles/understanding-reactivity.html
As part of your code, everytime the function is run, value of input$groupa is looked up [if its false or not], this value lookup is something which global is unable to do and can only be performed by server.

Build a RStudio addin to debug pipe chains

I wrote a function that helps executing pipe chains step by step.
To use it the users has to copy the instruction to clipboard, then execute the function, and move to the console to proceed.
I would like to build an addin that would allow me to select the instructions and run the function with Ctrl + P without the awkward steps.
Ideally, the addin would :
capture the selection
run the function
move the cursor to the console
be triggered by Ctrl + P
I believe it's extremely similar to what the reprex addin is doing but I don't know where to start as I'm 100% new to addins.
I looked into rstudioapi::getActiveDocumentContext() but there was nothing there of interest to me.
How can I make this work ?
The function
debug_pipe <- function(.expr){
.pchain <-
if (missing(.expr)) readClipboard() # windows only , else try clipr::read_clip()
else deparse(substitute(.expr))
.lhs <- if (grepl("^\\s*[[:alnum:]_.]*\\s*<-",.pchain[1])) {
sub("^\\s*([[:alnum:]_.]*)\\s*<-.*","\\1",.pchain[1])
} else NA
.pchain <- sub("[^%]*<-\\s*","",.pchain) # remove lhs of assignment if exists
.pchain <- paste(.pchain,collapse = " ") # collapse
.pchain <- gsub("\\s+"," ",.pchain) # multiple spaces to single
.pchain <- strsplit(.pchain,"\\s*%>%\\s*")[[1]] # split by pipe
.pchain <- as.list(.pchain)
for (i in rev(seq_along(.pchain))) {
# function to count matches
.f <- function(x) sum(gregexpr(x,.pchain[i],fixed = TRUE)[[1]] != -1)
# check if unbalanced operators
.balanced <-
all(c(.f("{"),.f("("),.f("[")) == c(.f("}"),.f(")"),.f("]"))) &
!.f("'") %% 2 &
!.f('"') %% 2
if (!.balanced) {
# if unbalanced, combine with previous
.pchain[[i - 1]] <- paste(.pchain[[i - 1]],"%>%", .pchain[[i]])
.pchain[[i]] <- NULL
}
}
.calls <- Reduce( # build calls to display
function(x,y) paste0(x," %>%\n ",y),
.pchain, accumulate = TRUE)
.xinit <- eval(parse(text = .pchain[1]))
.values <- Reduce(function(x,y){ # compute all values
if (inherits(x,"try-error")) NULL
else try(eval(parse(text = paste("x %>%", y))),silent = TRUE)},
.pchain[-1], .xinit, accumulate = TRUE)
message("press enter to show, 's' to skip, 'q' to quit, lhs can be accessed with `.`")
for (.i in (seq_along(.pchain))) {
cat("\n",.calls[.i])
.rdl_ <- readline()
. <- .values[[.i]]
# while environment is explored
while (!.rdl_ %in% c("q","s","")) {
# if not an assignment, should be printed
if (!grepl("^\\s*[[:alnum:]_.]*\\s*<-",.rdl_)) .rdl_ <- paste0("print(",.rdl_,")")
# wrap into `try` to safely fail
try(eval(parse(text = .rdl_)))
.rdl_ <- readline()
}
if (.rdl_ == "q") return(invisible(NULL))
if (.rdl_ != "s") {
if (inherits(.values[[.i]],"try-error")) {
# a trick to be able to use stop without showing that
# debug_pipe failed in the output
opt <- options(show.error.messages = FALSE)
on.exit(options(opt))
message(.values[[.i]])
stop()
} else
{
print(.)
}
}
}
if (!is.na(.lhs)) assign(.lhs,tail(.values,1),envir = parent.frame())
invisible(NULL)
}
Example code:
library(dplyr)
# copy following 4 lines to clipboard, no need to execute
test <- iris %>%
slice(1:2) %>%
select(1:3) %>%
mutate(x=3)
debug_pipe()
# or wrap expression
debug_pipe(
test <- iris %>%
slice(1:2) %>%
select(1:3) %>%
mutate(x=3)
)
Here are the steps I came with :
Two good ressources were :
The reprex addin's code from Jenny Bryan
This RStudio webinar
1. create a new package
New Project/R package/Name package as pipedebug
2. build R file
Put the function's code into a .R file in the R folder. We rename the function pdbg as I realised that magrittr already has a function called debug_pipe that does something different (it executes browser and returns input).
We must add a second function, without parameter, that the addin will trigger, we can name it however we want:
pdbg_addin <- function(){
selection <- rstudioapi::primary_selection(
rstudioapi::getSourceEditorContext())[["text"]]
rstudioapi::sendToConsole("",execute = F)
eval(parse(text=paste0("pdbg(",selection,")")))
}
The first line captures the selection, adapted from reprex's code.
The second line is sending an empty string to the console and not executing it, that's all I found to move the cursor, but there might be a better way.
The third line is running the main function with the selection as an argument.
3. Create dcf file
Next step is to create file inst/rstudio/addins.dcf with following content:
Name: debug pipe
Description: debug pipes step by step
Binding: pdbg_addin
Interactive: false
usethis::use_addin("pdbg_addin") will create the file, fill it with a template and open it so you can edit it.
4. build package
Ctrl+Shift+B
5. Add shortcut
Tools / addins / browse addins / keyboard shortcuts / debug pipe / Ctrl+P
6. Test it
Copy in text editor / select / Ctrl+P
test <- iris %>%
slice(1:2) %>%
select(1:3) %>%
mutate(x=3)
find a rough version here:
devtools::install_github("moodymudskipper/pipedebug")
?pdbg
similar efforts:
#Alistaire did this and advertised this other effort on his page.

What does builtins(internal = TRUE) return?

From ?builtins:
builtins(TRUE) returns an unsorted list of the names of internal functions, that is those which can be accessed as .Internal(foo(args ...)) for foo in the list.
I don't understand which functions are being returned.
I thought it would be all the closure functions in the base package that call .Internal().
However, the two sets don't match up.
base_objects <- mget(
ls(baseenv(), all.names = TRUE),
envir = baseenv()
)
internals <- names(
Filter(
assertive.types::is_internal_function,
base_objects
)
)
builtins_true <- builtins(internal = TRUE)
c(
both = length(intersect(internals, builtins_true)),
internals_not_builtins_true = length(setdiff(internals, builtins_true)),
builtins_true_not_internals = length(setdiff(builtins_true, internals))
)
## both internals_not_builtins_true builtins_true_not_internals
## 288 125 226
I also thought that it might be the values listed in src/main/names.c in R's source code, and there definitely seems to be some overlap with this, but it isn't exactly this list of values.
What is builtins() doing when you pass internal = TRUE?
Stibu's comment is a specific example of the general problem. ?builtins says that it fetches the names of the objects it returns directly from the symbol table (this is the C symbol table).
And builtins(TRUE) returns all the built-in objects callable via .Internal. That, however, doesn't mean there must be any function that calls .Internal(foo(args, ...)) for any foo.
Stibu gave one example: the internal function may not be called by an R function with the same name, as is the case for many generic functions where the default method calls .Internal.
Another example is something like .addCondHands and .addRestart, which are called by withCallingHandlers and withRestarts, respectively.
It's also possible that one R function calls multiple .Internal functions. I don't know of an example of that off the top of my head though.
After more digging, it seems that the list of functions is everything in the R_FunTab[] object in src/main/names.c where the second digit of the eval column is 1.
Here's a script to retrieve them.
library(stringi)
library(magrittr)
library(dplyr)
names.c <- readLines("https://raw.githubusercontent.com/wch/r-source/56a1b08b7282c5488acb71ee244098f4fd94f7c7/src/main/names.c")
fun_tab <- names.c[92:974] %>%
stri_replace_all_regex("^\\{", "") %>%
stri_replace_all_fixed("{PP", "PP") %>%
stri_replace_all_fixed("}},", "") %>%
stri_replace_all_fixed("\\t", "")
funs <- read.csv(text = fun_tab, header = FALSE, comment.char = "/")
cols <- names.c[86] %>%
stri_sub(4) %>%
stri_split_regex("\\t+") %>%
extract2(1) %>%
stri_trim()
colnames(funs) <- cols
funs$eval <- formatC(funs$eval, width = 3, flag = "0")
# Internal fns have 2nd digit of eval col == 1. See names.c[62:71]
internals <- funs %>% filter_(~ substring(eval, 2, 2) == 1)
I see slight differences when examining
setdiff(internals$printname, builtins(TRUE))
setdiff(builtins(TRUE), internals$printname)
For example builtins(TRUE) doesn't include shell.exec() if you aren't running Windows; mem.limits() was only recently removed from the devel branch of R, so it shows up in builtins(TRUE) for the current release version of R.

Delete data.frame columns and loop through data.frame assignment function

I found the following piece of code here at stackoverflow:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
So i wanted to use it for my own proposes, but it did not work out as planned.
What i try to archive is to use it in a for loop or with lapply to use it with multiple data.frames. Amongst others I tried:
d.frame1 <- iris
d.frame2 <- cars
l.frames <- c("d.frame1","d.frame2")
for (b in l.frames){
columnFunction(b)
}
but it yields the following error message:
Error in dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res :
$ operator is invalid for atomic vectors
Well, what i need additionally is that I can loop though that function so that i can iterate through different data.frames.
Last but not least I would need something like:
for (xyz in l.frames){
xyz <- columnFunction(xyz)
}
to automate the saving step.
Does anyone have any idea how i could loop though that function or how i could change the function so that it performs all those steps and is loopable.
I`m quite new to R so perhaps Im missing something obvious.
lapply was designed for this task:
l.frames <- list(d.frame1, d.frame2)
l.frames <- lapply(l.frames, columnFunction)
If you insist on using a for loop:
for (i in seq_along(l.frames)) l.frames[[i]] <- columnFunction(l.frames[[i]])

character string as function argument r

I'm working with dplyr and created code to compute new data that is plotted with ggplot.
I want to create a function with this code. It should take a name of a column of the data frame that is manipulated by dplyr. However, trying to work with columnnames does not work. Please consider the minimal example below:
df <- data.frame(A = seq(-5, 5, 1), B = seq(0,10,1))
library(dplyr)
foo <- function (x) {
df %>%
filter(x < 1)
}
foo(B)
Error in filter_impl(.data, dots(...), environment()) :
object 'B' not found
Is there any solution to use the name of a column as a function argument?
If you want to create a function which accepts the string "B" as an argument (as in you question's title)
foo_string <- function (x) {
eval(substitute(df %>% filter(xx < 1),list(xx=as.name(x))))
}
foo_string("B")
If you want to create a function which accepts captures B as an argument (as in dplyr)
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
eval(substitute(df %>% filter(xx < 1),list(xx=x)))
}
foo_nse(B)
You can find more information in Advanced R
Edit
dplyr makes things easier in version 0.3. Functions with suffixes "_" accept a string or an expression as an argument
foo_string <- function (x) {
# construct the string
string <- paste(x,"< 1")
# use filter_ instead of filter
df %>% filter_(string)
}
foo_string("B")
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
# construct the expression
expression <- lazyeval::interp(quote(xx < 1), xx = x)
# use filter_ instead of filter
df %>% filter_(expression)
}
foo_nse(B)
You can find more information in this vignette
I remember a similar question which was answered by #Richard Scriven. I think you need to write something like this.
foo <- function(x,...)filter(x,...)
What #Richard Scriven mentioned was that you need to use ... here. If you type ?dplyr, you will be able to find this: filter(.data, ...) I think you replace .data with x or whatever. If you want to pick up rows which have values smaller than 1 in B in your df, it will be like this.
foo <- function (x,...) filter(x,...)
foo(df, B < 1)

Resources