I have a function that takes two arguments:
my.function <- function(name, value) {
print(name)
print(value) #using print as example
}
I have an integer vector that has names and values:
freq.chars <- table(sample(LETTERS[1:5], 10, replace=TRUE))
I'd like to use lapply to apply my.function to freq.chars where the name of each item is passed in as x, and the value (in this case frequency) is passed in as y.
When I try,
lapply(names(freq.chars), my.function)
I get an error that "value" is missing with no default.
I've also tried
lapply(names(freq.chars), my.function, name = names(freq.chars), value = freq.chars)
, in which case I get an error: unused argument value = c(...).
Sorry for the edits and clarity, I'm new at this...
We use this test data:
set.seed(123) # needed for reproducibility
char.vector <- sample(LETTERS[1:5], 10, replace=TRUE)
freq.chars <- table(char.vector)
Here are several variations:
# 1. iterate simultaneously over names and values
mapply(my.function, names(freq.chars), unname(freq.chars))
# 2. same code except Map replaces mapply. Map returns a list.
Map(my.function, names(freq.chars), unname(freq.chars))
# 3. iterate over index and then turn index into names and values
sapply(seq_along(freq.chars),
function(i) my.function(names(freq.chars)[i], unname(freq.chars)[i]))
# 4. same code as last one except lapply replaces sapply. Returns list.
lapply(seq_along(freq.chars),
function(i) my.function(names(freq.chars)[i], unname(freq.chars)[i]))
# 5. this iterates over names rather than over an index
sapply(names(freq.chars), function(nm) my.function(nm, freq.chars[[nm]]))
# 6. same code as last one except lapply replaces sapply. Returns list.
lapply(names(freq.chars), function(nm) my.function(nm, freq.chars[[nm]]))
Note that mapply and sapply have an optional USE.NAMES argument that controls whether names are inferred for the result and an optional simplify argument ('SIMPLIFYformapply`) which controls whether list output is simplified. Use these arguments for further control.
Update Completely revised presentation.
If you just want to add another parameter to your function, specify it after the function name (3rd parm in lapply).
lapply(names(freq.chars), my.function, char.vector)
Related
Using the default "iris" DataFrame in R, how come when creating a new column "NewCol"
iris[,'NewCol'] = as.POSIXlt(Sys.Date()) # throws Warning
BUT
iris$NewCol = as.POSIXlt(Sys.Date()) # is correct
This issue doesn't exist when assigning Primitive types like chr, int, float, ....
First, notice as #sindri_baldur pointed, as.POSIXlt returns a list.
From R help ($<-.data.frame):
There is no data.frame method for $, so x$name uses the default method which treats x as a list (with partial matching of column names if the match is unique, see Extract). The replacement method (for $) checks value for the correct number of rows, and replicates it if necessary.
So, if You try iris[, "NewCol"] <- as.POSIClt(Sys.Date()) You get warning that You're trying assign a list object to a vector. So only the first element of the list is used.
Again, from R help:
"For [ the replacement value can be a list: each element of the list is used to replace (part of) one column, recycling the list as necessary".
And in Your case, only one column is specified meaning only the first element of the as.POSIXlt's result (list) will be used. And You are warned of that.
Using $ syntax the iris data.frame is treated as a list and then the result of as.POSIXlt - a list again - is appended to it. Finally, the result is data.frame, but take a look at the type of the NewCol - it's a list.
iris[, "NewCol"] <- as.POSIXlt(Sys.Date()) # warning
iris$NewCol2 <- as.POSIXlt(Sys.Date())
typeof(iris$NewCol) # double
typeof(iris$NewCol2) # list
Suggestion: maybe You wanted to use as.POSIXct()?
When I try to pass a list whose names is not NULL, I get the following error from evaluating do.call: Error: argument "x" is missing, with no default. Is there another way to bypass the names of the list and access instead the actual elements within the list without setting the names to NULL?
# with NULL names, do.call runs
num_list <- list(1:10)
do.call(mean,num_list)
# without names being NULL, do.call fails
names(num_list) <- 'a'
do.call(mean,num_list)
Specifically, I'd like to pass the list to a function's ellipsis such as for raster::merge, https://www.rdocumentation.org/packages/raster/versions/3.3-7/topics/merge.
library(rgdal)
library(sf)
library(raster)
cities <- sf::st_read(system.file("vectors/cities.shp", package = "rgdal"))
birds <- sf::st_read(system.file("vectors/trin_inca_pl03.shp", package = "rgdal"))
sf_shapes <- list(cities, birds)
# without names works
sf_shape_extents = lapply(sf_shapes, raster::extent)
sf_max <- do.call(what = raster::merge, args = sf_shape_extents)
# with names does not
names(sf_shapes) <- c('cities', 'birds')
sf_shape_extents_names = lapply(sf_shapes, raster::extent)
sf_max_names <- do.call(what = raster::merge, args = sf_shape_extents)
You either ensure that the names of the list being passed in corresponds to the parameters of the function, or that the list is unnamed and the position of the list elements corresponds to the position of the parameter in question.
names(num_list) <- 'x'
do.call(mean,num_list)
[1] 5.5
names(num_list) <- 'a'
do.call(mean,unname(num_list))
[1] 5.5
EDIT:
I do not see any structural change in your edited version. The error is because of the names since they do not correspond to the named parameters of the function. You are passing in named arguments and that will throw an error.
The question you need to ask yourself is what are the parameter names of the function you intend to use?
If the ellipsis of a function takes in unnamed parameters, then whether the passed in arguments are named or not, it does not matter. eg, the paste function in R:
a <- list(a="a",b=3,c="d");
do.call(paste,a)
[1] "a 3 d"
I created the following function which finds the columns correlation to the target. The function is applied on the diamonds dataset (assigned to dt here) for this purpose.
select_variables_gen <- function(variable, target = dt$price, threshold = 0.9){
if(all(class(variable) %in% c("numeric","integer"))){
corr <- abs(cor(variable, target));
if(corr > threshold){
return(T);
}else{F}
}else{F}
};
Now that I want to apply the function I can't figure out how to specify the arguments of the function. This is what I tried
alt_selected_gen <- names(dt)[sapply(dt,
select_variables(variable = dt, target = dt$carat, threshold = 0.1))]
alt_selected_gen;
Which returns an error saying thaht the 2nd and 3rd argument are unused. How can I use the function (with sapply or any other way) to be able to specify the arguments?
My desired output is the column names of the columns which have a correlation above the threshold. So using the default values with the above code that would be;
[1] "carat" "price"
You pass your function to sapply. What you are trying to pass is a call to your function.
When you use sapply on a data frame, the columns get sent one by one to your function as its first argument. If you want to pass further named arguments to your function you just add them directly as parameters to sapply after the function itself. This works because of the dots operator (...) in sapply's formal arguments, which pass any extra parameters into the call to your function.
It should therefore just be
names(dt)[sapply(dt, select_variables_gen, target = dt$carat, threshold = 0.1)]
#> [1] "carat" "table" "price" "x" "y" "z"
Notice also that the function is called select_variables_gen in your example, not select_variables.
I've got a quoted list
quote(list(orders = .N,
total_quantity = sum(quantity)))
(that I eventually eval in the j part of a data.table)
What I would like is to extract the names of that list without having to evaluate the expression because outside of the correct environment evaluating the expression will produce an error.
The list doesn't have any names at that point. It's not even a list. It's a call to the list() function. But that said you can parse that function call and extract name parameter. For example
x <- quote(list(orders = .N,
total_quantity = sum(quantity)))
names(as.list(x))[-1]
# [1] "orders" "total_quantity"
That as.list() on the expression turns the function call into a (named) list without evaluation.
If I want to create a named list, where I have named literals, I can just do this:
list(foo=1,bar=2,baz=3)
If instead I want to make a list with arbitrary computation, I can use lapply, so for example:
lapply(list(1,2,3), function(x) x)
However, the list generated by lapply will always be a regular numbered list. Is there a way I can generate a list using a function like lapply with names.
My idea is something along the lines of:
lapply(list("foo","bar","baz), function(key) {key=5}
==>
list(foo=5,bar=5,baz=5)
That way I don't have to have the keys and values as literals.
I do know that I could do this:
res = list()
for(key in list("foo","bar","baz") {
res[key] <- 5;
}
But I don't like how I have to create a empty list and mutate it to fill it out.
Edit: I would also like to do some computation based on the key. Something like this:
lapply(c("foo","bar","baz"), function(key) {paste("hello",key)=5})
sapply will use its argument for names if it is a character vector, so you can try:
sapply(c("foo","bar","baz"), function(key) 5, simplify=F)
Which produces:
$foo
[1] 5
$bar
[1] 5
$baz
[1] 5
If your list has names in the first place, lapply will preserve them
lapply(list(a=1,b=2,c=3), function(x) x)
or you can set names before or after with setNames()
#before
lapply(setNames(list(1,2,3),c("foo","bar","baz")), function(x) x)
#after
setNames(lapply(list(1,2,3), function(x) x), c("foo","bar","baz"))
One other "option" is Map(). Map will try to take the names from the first parameter you pass in. You can ignore the value in the function and use it only for the side-effect of keeping the name
Map(function(a,b) 5, c("foo","bar","baz"), list(1:3))
But names cannot be changed during lapply/Map steps. They can only be copied from another location. if you need to mutate names, you'll have to do that as a separate step.