I have a list with not limited count: parameter<-2,1,3,4,5......
And I would like to repeat a function with the parameter:
MyFunction('2')
MyFunction('1')
MyFunction('3') etc.
Thank you very much for any tips
Like most things in R, there's more than one way of handling this problem. The tidyverse solution is first, followed by base R.
purrr/map
I don't have detail about your desired output, but the map function from the purrr package will work in the situation you describe. Let's use the function plus_one() to demonstrate.
library(tidyverse) # Loads purrr and other useful functions
plus_one <- function(x) {x + 1} # Define our demo function
parameter <- c(1,2,3,4,5,6,7,8,9)
map(parameter, plus_one)
map returns a list, which isn't always desired. There are specialized versions of map for specific kinds of output. Depending on what you want to do, you map_chr, map_int, etc. In this case, we could use map_dbl to get a vector of the returned values.
map_dbl(parameter, plus_one)
Base R
The apply family of functions from base R could also meet your needs. I prefer using purrr but some people like to stick with built-in functions.
lapply(parameter, plus_one)
sapply(parameter, plus_one)
You end up with the same results.
identical({map(parameter, plus_one)}, {lapply(parameter, plus_one)})
# [1] TRUE
Related
This is a problem I often encounters: I try to access an object's own name when using a function from apply family and spend hours figuring out how to do it... For instance (this is not the core of my question), today I was willing to inspect an attached package trying to figure out if it contained some non function objects. After a lot of tries and fails, I finally came up with (for the rrapply package - I know looking at the documentation is also easy but this one illustrates well the problem):
library(rrapply)
eapply(rlang::pkg_env('rrapply'), function(x) {if(!is.function(x)) x}) %>%
`[`(sapply(., function(x) !is.null(x))) %>%
names()
## [1] "renewable_energy_by_country" "pokedex"
I feel that is really too complicated for a simple test !
So my question: is there an easy way to loop through an object in base R (or maybe tidyverse) and return only the names of those elements that correspond to a certain condition ? rrapply seems to be able to achieve that but:
it is fairly complicated
and it seems to work on lists only and to loop through all sub-elements as well which is not desired
Thanks !
Identify the environment of interest, e, and then use eapply with the indicated function taking the names of the extracted elements at the end. This isn't conceptually different from the code in the question but does seem somewhat less complex when done in base R in the following way:
e <- as.environment("package:rrapply")
names(Filter(`!`, eapply(e, is.function)))
or the same code written as a pipeline:
library(magrittr)
"package:rrapply" %>%
as.environment %>%
eapply(is.function) %>%
Filter(`!`, .) %>%
names
I'm having some trouble writing some code that dispatches on matrices. To assist me, I'd like to see what generic functions in the base library dispatch on matrices. Is there any way to get R to give me a list of them? Failing that, does anyone know of any members of that list?
There are at least seven functions in base R that have matrix generics:
anyDuplicated
determinant
duplicated
isSymmetric
subset
summary
unique
you can get them with
getS3method("anyDuplicated", class="matrix")
or just
anyDuplicated.matrix
Found using
Filter(function(x) {
!is.null(getS3method(x, class="matrix", optional=TRUE))
},ls(all.names=TRUE, env = baseenv()))
In the past, when working with a data frame and wanting to get a single column as a vector, I would use magrittr::extract2() like this:
mtcars %>%
mutate(wt_to_hp = wt/hp) %>%
extract2('wt_to_hp')
But I've seen that dplyr::pull() and purrr::pluck() also exists to do much the same job: return a single vector from a data frame, not unlike [[.
Assuming that I'm always loading all 3 libraries for any project I work on, what are the advantages and use cases of each of these 3 functions? Or more specifically, what distinguishes them from each other?
When you "should" use a function is really a matter of personal preference. Which function expresses your intention most clearly. There are differences between them. For example, pluck works better when you want to do multiple extractions. From help file:
accessor(x[[1]])$foo
# is the same as
pluck(x, 1, accessor, "foo")
so while it can be use to just extract a column, it's useful when you have more deeply nested structures or you want to compose with an accessor function.
The pull function is meant to blend in with the result of the dplyr function. It can take the name of a column using any of the ways you can with other functions in the package. For example it will work with !! style expansion where say extract2 will not.
irispull <- function(x) {
iris %>% pull(!!enquo(x))
}
irispull(Sepal.Length)
And extract2 is nothing more than a "more readable" wrapper for the base function [[. In fact it's defined as .Primitive("[[") so it expects column names as character or column indexes and integers.
I have a script that takes too long to compute and I'm trying to paralellize its execution.
The script basically loops through each row of a data frame and perform some calculations as shown below:
my.df = data.frame(id=1:9,value=11:19)
sumPrevious <- function(df,df.id){
sum(df[df$id<=df.id,"value"])
}
for(i in 1:nrow(my.df)){
print(sumPrevious(my.df,my.df[i,"id"]))
}
I'm starting to learn to parallelize code in R, this is why I first want to understand how I could do this with an apply-like function (e.g. sapply,lapply,mapply).
I've tried multiple things but nothing worked so far:
mapply(sumPrevious,my.df,my.df$id) # Error in df$id : $ operator is invalid for atomic vectors
Using theparallel package in R you can use the mclapply() function. You will need to adjust your code a little bit to make it run in parallel.
library(parallel)
my.df = data.frame(id=1:9,value=11:19)
sumPrevious <- function(i,df){df.id = df$id[i]
sum(df[df$id<=df.id,"value"])
}
mclapply(X = 1:nrow(my.df),FUN = sumPrevious,my.df,mc.preschedule = T,mc.cores = no.of.cores)
This code will run the sumPrevious in parallel on no.of.cores in your machine.
Well, this is fun playing with. you kind need something like below:
mapply(sumPrevious,list(my.df),my.df$id)
For supply, since the first input is the dataframe, you will have to define a given function for it to be ale to recognize it so:
sapply(my.df$id,function(x,y) sumPrevious(y,x),my.df)
I prefer mapply here since we can set the first value to be imputed as the dataframe directly. But the whole of the dataframe. That's why you have to use the function list.
Map ia a wrapper of mapply and thus would just present the solution in a list format. try it. Also lapply is similar to sapply only that sapply would have to simplify the results into an array format while lapply would give the same results as a list.
Though it seems whatever you are trying to do can simply be done by a cumsum function.
cumsum(df$values)
There seems to be general agreement that the l in "lapply" stands for list, the s in "sapply" stands for simplify and the r in "rapply" stands for recursively. But I could not find anything on the t in "tapply". I am now very curious.
Stands for table since tapply is the generic form of the table function. You can see this by comparing the following calls:
x <- sample(letters, 100, rep=T)
table(x)
tapply(x, x, length)
although obviously tapply can do more than counting.
Also, some references that refer to "table-apply":
R and S Plus companion
Modern Applied Biostatistical Methods
I think of it as 'table'-apply since the result comes as a matrix/table/array and its dimensions are established by the INDEX arguments. An R table-classed object is really very similar in contrcution and behavior to an R matrix or array. The application is being performed in a manner similar to that of ave. Groups are first assembled on the basis of the "factorized" INDEX argument list (possibly with multiple dimensions) and a matrix or array is returned with the results of the FUN applied to each cross-classified grouping.
The other somewhat similar function is 'xtabs'. I keep thinking it should have a "FUN" argument, but what I'm probably forgetting at that point is really tapply.
tapply is sort of the odd man out. As far as I know, and as far as the R documentation for the apply functions goes, the 't' does not stand for anything, unlike the other apply functions which indicate the input or output options.