I am attempting to pass multiple arguments to the built in piping operator in Julia |>.
I would like something that works like this:
join([randstring() for i in 1:100], " ")
However, using the piping operator, I get an error instead:
[randstring() for i in 1:100] |> join(" ")
I am pretty sure this is a feature of multiple dispatch with join having its own method with delim in the join(strings, delim, [last]) method being defined as delim="" when omitted.
Am I understanding this correctly? Is there a work around?
For what is is worth the majority of my uses of piping end up taking more than one argument. For example:
[randstring() for i in 1:100] |> join(" ") |> replace("|", " ")
The piping operator doesn't do anything magical. It simply takes values on the left and applies them to functions on the right. As you've found, join(" ") does not return a function. In general, partial applications of functions in Julia don't return functions — it'll either mean something different (via multiple dispatch) or it'll be an error.
There are a few options that allow you to support this:
Explicitly create anonymous functions:
[randstring() for i in 1:100] |> x->join(x, " ") |> x->replace(x, "|", " ")
Use macros to enable the kind of special magic you're looking for. There are some packages that support this kind of thing.
Metaprogramming to the rescue!
We'll use a simple macro to allow piping to multi-input functions.
using Pipe, Random
#pipe [randstring() for i in 1:100] |> join(_, " ")
So after calling the Pipe package all we're doing is
using the #pipe macro
designating where where to pipe to with the underscore ("_")
[if the function only takes one input we don't need to bother with the underscore:
e.g.
#pipe 2 |> +(3,_) |> *(_,4) |> println
will print "20"]
See here or here for more formal documentation of the Pipe package (not that there's much to document :).
Related
Goodnight everyone.
Why do some functions of ''r'' not work with the pipe (%>%), as is the case with unique()?.
For example, if I run an object named GOT as follows:
GOT %>%
unique(region)
This does not give me results, but if I do it in the following way:
unique(GOT$region)
the results are displayed.
That doesn't happen with other functions like select() or arrange(). For what is this? Thanks.
The pipe operator passes the left hand side into the first argument of the function in the right hand side (unless specified otherwise). In that sense, it works with every function, including unique.
But, not every function works like the dplyr functions you mentioned. select and arrange are part of dplyr/(tidyverse), and, as many other function in that (group of) package(s): they (1) have the first argument as a dataframe; and (2) allow you to refer to a column only by its name in the other arguments.
unique is from base R, and it has none of the two characteristics above. As the default usage of unique is unique(x, incomparables = FALSE, ...), you're doing unique('GOT, incomparables = region, ...), so that column name isn't being passed onto an argument that can comprehend it.
If you really want to use a pipe, you can do:
GOT %>% {unique(.$region)}
GOT %>% pull(region) %>% unique() as #Curt F. added
GOT %>% with(region) %>% unique() as #Ric Villalba added
Why doesn't the pipe operator %>% work in the second example in the following code ?
library(magrittr)
# Works
job::job({install.packages("gtsummary")})
# Doesn't work
{install.packages("gtsummary")} %>% job::job()
# Error in code[[1]] : object of type 'symbol' is not subsettable
Is this because the piped object is an expression ? I'm not familiar with expressions in R
Based on docs of magrittr library
Technical notes
The magrittr pipe operators use non-standard evaluation
I think the problem comes from NSE algorithm in R
as a workaround you can use the native pip operator like
{install.packages("gtsummary")} |> job::job()
The problem is that job::job uses non-standard evaluation to grab the code you want to run without running it right away. A more basic examples of a function that does this is
nse <- function(x) {
as.character(substitute(x))
}
nse(hello)
# [1] "hello"
hello %>% nse
# [1] "."
So when you run nse(hello), note that hello is not a variable but the function can use substitute() to get the code for the value you pass to the function. So hello is never evaluated, it's converted to a symbol.
When you use the magrittr pipe, the value is not passed as itself. All values from the previous calculation are stored in a variable named "." so that's why we see that value when using the pipe operator. The job::job function is expecting a code block, not just a single symbol . hence you get the error.
The native |> pipe operator works because it doesn't create the special value variable and it actually re-writes the code you wrote at the abstract syntax tree level. You can see this if you do
quote(hello %>% nse())
# hello %>% nse()
quote(hello |> nse())
# nse(hello)
The |> operator actually re-writes the code so it doesn't exist.
I am trying to turn the Influx query into a function in R so I can change the fields as I see fit. Here is an example of the code I am running
my_bucket <- "my_bucket"
start <- "start_time"
stop <- "stop_time"
q <- paste('from(bucket:',my_bucket,')|> range(start:',start,'stop:,'stop')',sep = "")
data <- client$query(q)
Error in private$.throwIfNot2xx(resp) :
API client error (400): compilation failed: error at #1:1-1:2: invalid statement: '
This particular method uses paste() and it keeps the escape character \ in the query. I would like to get rid of that . I have tried using cat() but that is for printing to the console and also have tried capture.output() of the cat() string which still captures the escape characters.
What I would like to see and be stored as an object is the output below. I used cat() to show you exactly what I need (I know I can't use it to store things).
cat('\'from(bucket:\"',my_bucket,'\")|> range(start:',start,',stop:,',stop,')\'', sep = "")
>'from(bucket:"my_bucket")|> range(start:start_time,stop:,stop_time)'
Note the single quotes around the query beginning at from and ending after the parantheses after stop_time. In addtion the double quotes must be present around the bucket I call to. This is required syntax for the query from R.
I would suggest you try to use sprintf, I find it much easier to properly format the query.
q <- sprintf('from(bucket: "%s") |> range(start: %s, stop: %s)', my_bucket, start, stop)
Anyway, the same can be done with paste:
q <- paste('from(bucket: "',my_bucket,'") |> range(start: ',start,',stop: ',stop,')',sep = "")
I'm very surprised if this kind of problems cannot be solved with sparklyr:
iris_tbl <- copy_to(sc, aDataFrame)
# date_vector is a character vector of element
# in this format: YYYY-MM-DD (year, month, day)
for (d in date_vector) {
...
aDataFrame %>% mutate(newValue=gsub("-","",d)))
...
}
I receive this error:
Error: org.apache.spark.sql.AnalysisException: Undefined function: 'GSUB'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 2 pos 86
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.failFunctionLookup(SessionCatalog.scala:787)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction0(HiveSessionCatalog.scala:200)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction(HiveSessionCatalog.scala:172)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:884)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:884)
at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun
But with this line:
aDataFrame %>% mutate(newValue=toupper("hello"))
things work. Some help?
It may be worth adding that the available documentation states:
Hive Functions
Many of Hive’s built-in functions (UDF) and built-in aggregate functions (UDAF) can be called inside dplyr’s mutate and summarize. The Languange Reference UDF page provides the list of available functions.
Hive
As stated in the documentation, a viable solution should be achievable with use of regexp_replace:
Returns the string resulting from replacing all substrings in
INITIAL_STRING that match the java regular expression syntax defined
in PATTERN with instances of REPLACEMENT. For example,
regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some
care is necessary in using predefined character classes: using '\s' as
the second argument will match the letter s; '\\s' is necessary to
match whitespace, etc.
sparklyr approach
Considering the above it should be possible to combine sparklyr pipeline with
regexp_replace to achieve effect cognate to applying gsub on the desired column. Tested code removing the - character within sparklyr in variable d could be build as follows:
aDataFrame %>%
mutate(clnD = regexp_replace(d, "-", "")) %>%
# ...
where class(aDataFrame ) returns: "tbl_spark" ....
I would strongly recommend you read the sparklyr documentation before proceeding. In particular, you're going to want to read the section on how R is translated to SQL (http://spark.rstudio.com/dplyr.html#sql_translation). In short, a very limited subset of R functions are available for use on sparklyr dataframes, and gsub is not one of those functions (but toupper is). If you really need gsub you're going to have to collect the data in to a local dataframe, then gsub it (you can still use mutate), then copy_to back to spark.
I am a bit confused with the way arguments are transmitted to r function, and the associated syntax (quoting, substituting, evaluating, calling, expressions, "...", ...) .
Basically, what I need to do is to pass arguments in a function using only their name, but without using the type "character".
This is a (not working) illustration of what I would like to do
require(dplyr)
test <- function(x) select(iris, DesiredFunction(x))
test(Species)
I am also interested in general resources about the possibilities to pass arguments to functions.
Thank you,
François
UPDATE
The following is working
require(dplyr)
test <- function(x) select_(iris, substitute(x))
test(Species)
Is there a way to do this but with "select" instead of "select_" ?
Or in other words, what is the inverse operation for quoting ?