R - Access an object own's name in apply functions - r

This is a problem I often encounters: I try to access an object's own name when using a function from apply family and spend hours figuring out how to do it... For instance (this is not the core of my question), today I was willing to inspect an attached package trying to figure out if it contained some non function objects. After a lot of tries and fails, I finally came up with (for the rrapply package - I know looking at the documentation is also easy but this one illustrates well the problem):
library(rrapply)
eapply(rlang::pkg_env('rrapply'), function(x) {if(!is.function(x)) x}) %>%
`[`(sapply(., function(x) !is.null(x))) %>%
names()
## [1] "renewable_energy_by_country" "pokedex"
I feel that is really too complicated for a simple test !
So my question: is there an easy way to loop through an object in base R (or maybe tidyverse) and return only the names of those elements that correspond to a certain condition ? rrapply seems to be able to achieve that but:
it is fairly complicated
and it seems to work on lists only and to loop through all sub-elements as well which is not desired
Thanks !

Identify the environment of interest, e, and then use eapply with the indicated function taking the names of the extracted elements at the end. This isn't conceptually different from the code in the question but does seem somewhat less complex when done in base R in the following way:
e <- as.environment("package:rrapply")
names(Filter(`!`, eapply(e, is.function)))
or the same code written as a pipeline:
library(magrittr)
"package:rrapply" %>%
as.environment %>%
eapply(is.function) %>%
Filter(`!`, .) %>%
names

Related

What distinguishes dplyr::pull from purrr::pluck and magrittr::extract2?

In the past, when working with a data frame and wanting to get a single column as a vector, I would use magrittr::extract2() like this:
mtcars %>%
mutate(wt_to_hp = wt/hp) %>%
extract2('wt_to_hp')
But I've seen that dplyr::pull() and purrr::pluck() also exists to do much the same job: return a single vector from a data frame, not unlike [[.
Assuming that I'm always loading all 3 libraries for any project I work on, what are the advantages and use cases of each of these 3 functions? Or more specifically, what distinguishes them from each other?
When you "should" use a function is really a matter of personal preference. Which function expresses your intention most clearly. There are differences between them. For example, pluck works better when you want to do multiple extractions. From help file:
accessor(x[[1]])$foo
# is the same as
pluck(x, 1, accessor, "foo")
so while it can be use to just extract a column, it's useful when you have more deeply nested structures or you want to compose with an accessor function.
The pull function is meant to blend in with the result of the dplyr function. It can take the name of a column using any of the ways you can with other functions in the package. For example it will work with !! style expansion where say extract2 will not.
irispull <- function(x) {
iris %>% pull(!!enquo(x))
}
irispull(Sepal.Length)
And extract2 is nothing more than a "more readable" wrapper for the base function [[. In fact it's defined as .Primitive("[[") so it expects column names as character or column indexes and integers.

data.table: feeding list of logical conditions in i

I have to write a long script (call it Script) doing various operations on a data.table, then apply this a few times for different subsets of rows. I would like to be able to do the following:
condition <- "X>10"
source(Script)
... where Script will contain many of the following:
dt[MAGIC(condition), .......]
This would allow me to keep Script and the condition in different files (the second a markdown showing the results only, and which I'd like to be as simple as possible in terms of code).
What I don't want is to copy paste the script for each of the conditions, and manually change it, since this is way too error prone.
I tried lots of combinations of parse, deparse, substitute, quote, as.expression, as.logical, etc. but I seem to be staggering in the dark. I'd be really grateful if somebody could help!
NB: I can easily do the above in dplyr:
df %>% filter_(condition)
and of course I can also turn this back into a data.table
df %>% filter_(condition) %>% data.table()
... but I'd rather work consistently with data.table (faster, prefer the syntax, etc.)
We use eval(parse
setdT(dt)[eval(parse(text=condition))]

How to have functions chaining in R like in c# with linq we have method chaining?

I am a new-bee to R one thing I noticed in R that we need to keep on saving the result to the variable each time before further processing is required. Is there some way where I can store the result to some buffer and later on use this buffer result in further processing.
For people who are familiar with c# using LINQ we have a feature called Method Chaining, here we keep on passing the intermediate result to various functions on the fly without the need of storing them into separate variables and in the end, we get the required output.This saves lots of extra syntax, so is there something like this in R?
Function composition is to functional programming as method chaining is to object-oriented programming.
x <- foo(bar(baz(y)))
is basically the same as
x = baz(y).bar().foo()
in the languages you might be familiar with.
If you're uncomfortable with nested parens and writing things backwards, the magrittr package provides the %>% operator to unpack expressions:
library(magrittr)
x = y %>% baz() %>% bar() %>% foo()
R also provides a couple of frameworks for conventional OO programming: reference classes and R6. With those, you can write something like
x = y$baz()$bar()$foo()
but I'd suggest learning how to deal with "normal" R expressions first.
In R we have something called Pipes(%>%) through which one can send the output of one function to another, i.e output from one function becomes input for subsequent function in the chain.
Try something like in this in R console Consider a tibble MyData containing Username and pwd as two columns u can use pipes as:
MyData %>%
select(username,pwd)
%>%
filter(!is.na(username))%>%
arrange(username).
This will print all the usernames and pwd sorted by username that contains non NA's value
Hope that helps

R repeating a function

I have a list with not limited count: parameter<-2,1,3,4,5......
And I would like to repeat a function with the parameter:
MyFunction('2')
MyFunction('1')
MyFunction('3') etc.
Thank you very much for any tips
Like most things in R, there's more than one way of handling this problem. The tidyverse solution is first, followed by base R.
purrr/map
I don't have detail about your desired output, but the map function from the purrr package will work in the situation you describe. Let's use the function plus_one() to demonstrate.
library(tidyverse) # Loads purrr and other useful functions
plus_one <- function(x) {x + 1} # Define our demo function
parameter <- c(1,2,3,4,5,6,7,8,9)
map(parameter, plus_one)
map returns a list, which isn't always desired. There are specialized versions of map for specific kinds of output. Depending on what you want to do, you map_chr, map_int, etc. In this case, we could use map_dbl to get a vector of the returned values.
map_dbl(parameter, plus_one)
Base R
The apply family of functions from base R could also meet your needs. I prefer using purrr but some people like to stick with built-in functions.
lapply(parameter, plus_one)
sapply(parameter, plus_one)
You end up with the same results.
identical({map(parameter, plus_one)}, {lapply(parameter, plus_one)})
# [1] TRUE

How to operate non-standard-evaluation in correct manner for summarize{dplyr}

I want to pass variables to 'summarize' by way of non-standard-evaluation approach (see http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions).
My script is as follows:
library(dplyr)
library(pryr)
x2<-data.frame(x=runif(1000,1,10),y=rnorm(1:1000))
y2<-group_by(x2,x)
field2<-"x"
z<-substitute(summarize(y2,check=sum(x)),list(x=as.name(field2)))
eval(quote(z),parent.frame())
But the output is not a dataframe as I supposed but a string:
>eval(quote(z),parent.frame())
summarize(y2, check = sum(x))
I am a little bit confused with non-standard-evaluation although I have looked through a number of examples.
Could you specify what is wrong with my approach?

Resources