Trouble applying function to data frame - r

Toy example:
> myfn = function(a,x){sum(a*x)}
> myfn(a=2, x=c(1,2,3))
[1] 12
Good so far. Now:
> df = data.frame(a=c(4,5))
> df$ans = myfn(a=df$a, x=c(1,2,3))
Warning message:
In a * x : longer object length is not a multiple of shorter object length
> df
a ans
1 4 26
2 5 26
What I want to happen is that for the first row, it is as if I called myfn(a=4, x=c(1,2,3), giving an answer of 24, and for the second row, it is as if I called myfn(a=5, x=c(1,2,3) giving an answer of 30. How do I do this? Thank you.
EDIT: slightly more complex version. Now suppose that the function is
myfn = function(a,b, x){sum((a+b)*x)}
and that I have the data frame
df = data.frame(a=c(4,5), b=c(6,7), c=c(9,9))
I want to create df$ans such that, for the first row it is as if I called myfn(a=4, b=6, x=c(1,2,3) and for the second for it is as if I called myfn(a=5, b=7, x=c(1,2,3), that is, use df$x for a, df$y for b, and ignore df$z.

Something like this would work:
myfn = function(a,x){
return(sum(a*x))
}
df <- data.frame(a=c(4,5))
df$ans <- apply(df, 1, myfn, x = c(1,2,3))
df$ans
a ans
1 4 24
2 5 30
** Edited Based On User Edit **
df = data.frame(a=c(4,5), b=c(6,7), c=c(9,9))
df$ans <- apply(df[, c("a", "b")], 1, function(y) sum((y['a']+y['b'])*c(1,2,3)))
a b c ans
1 4 6 9 60
2 5 7 9 72

There are several ways this can be done, each with it's own charms. If you don't want to modify the function I would just do
mapply(myfn, df$x, df$y, MoreArgs = list(x = 1:3))
Alternatively, you can bake the iteration right into the function, e.g,
myfn = function(a,b, x){
sapply(a+b, function(ab) {
sum(ab*x)
})
}
myfn(df$x, df$y, 1:3)
That's probably the way I would do it.

Related

Function argument as value or column name for data.table

I want my function to be able to take a value or a column name. How can I do this with data.table?
library(data.table)
df <- data.table(a = c(1:5),
b = c(5:1),
c = c(1, 3, 5, 3, 1))
myfunc <- function(val) {
df[a >= val]
}
# This works:
myfunc(2)
# This does not work:
myfunc("c")
If I define my function as:
myfunc <- function(val) {
df[a >= get(val)]
}
# This doesn't work:
myfunc(2)
# This works:
myfunc("c")
What is the best way to resolve this?
Edit: To be clear, I want to results to be the same as:
# myfunc(2)
df %>%
filter(a >= 2)
# myfunc("c")
df %>%
filter(a >= c)
EDIT:
Thanks all for the responses, I think I like dww's answer the best.
I wish it was as easy as in dplyr, where I can do:
myfunc <- function(val) {
df %>%
filter(a >= {{val}})
}
# Both work:
myfunc(2)
myfunc(c)
If you build and parse the whole expression, then you can evaluate it in its entirety. For example
myfunc <- function(val) {
df[eval(parse(text=paste("a >= ", val)))]
}
Though relying on a function that lets you mix values and variable names in the same parameter might be dangerous. Especially in the case where you actually wanted to match on character values rather than variable names. If you passed in the whole expression you could do
myfunc <- function(expr) {
expr <- substitute(expr)
df[eval(expr)]
}
myfunc(a>=3)
myfunc(a>=c)
The question did not actually define the desired behavior so we assume that df must be a data.table and if a character string is passed then the column of that name should be returned and if a number is passed then those rows whose a column exceed that number should be returned.
Define an S3 generic and methods for character and default.
myfunc <- function(x, data = df) UseMethod("myfunc")
myfunc.character <- function(x, data = df) data[[x]]
myfunc.default <- function(x, data = df) data[a > x]
myfunc(2)
## a b c
## 1: 3 3 5
## 2: 4 2 3
## 3: 5 1 1
myfunc("c")
## [1] 1 3 5 3 1

How to apply a custom function to every value in a dataframe?

I am trying to apply a custom function to every value of a dataframe. Here is the custom function and dataframe:
#function
test_fun <- function(x, y = 1) {
output <- x + y
output
}
#dataframe
df <- data.frame(a = c(1,2,3), b = c(4,5,6))
Now lets say I want to apply test_fun, with y = 2, to every value of df. This method doesn't seem to work:
lapply(df, test_fun(y = 2))
The function is vectorized, we can directly apply over the dataset
test_fun(df, y = 2)
# a b
#1 3 6
#2 4 7
##3 5 8
Regarding the OP's error, if we are not using lambda function, specify the argument as
lapply(df, test_fun, y = 2)
-output
#$a
#[1] 3 4 5
#$b
#[1] 6 7 8
Or specify the lambda function and then use (y = 2)
lapply(df, function(vec) test_fun(vec, y = 2))

Function in R to rename variables and compute R code

I would like to write a function which takes a list of variables out of a dataframe, say:
df <- data.frame(a = c(1,2,3,4,5), b = c(6,7,8,9,10))
And to compute always the same calculation, say calculate the standard deviation like:
test.function <- function(var){
for (i in var) {
paste0(i, "_per_sd") <- i / sd(i)
}
}
In order to create a new variable a_per_sd which is divided by it's standard deviation. Unfortunately, I am stuck and get a Error in paste0(i, "_per_sd") <- i/sd(i) : could not find function "paste0<-" error.
The expected usage should be:
test.function(df$a, df$b)
The expected result should be:
> df$a_per_sd
[1] 0.6324555 1.2649111 1.8973666 2.5298221 3.1622777
And for every other variable which was given.
Somehow I think I should use as.formula and/or eval, but I might be doing a thinking error.
Thank you very much for your attention and help.
Is this what you are after?
df <- data.frame(a = c(1,2,3,4,5), b = c(6,7,8,9,10))
test.function <- function(...){
x <- list(...)
xn <- paste0(unlist(eval(substitute(alist(...)))),
"_per_sd")
setNames(lapply(x, function(y) y/sd(y)), xn)
}
cbind(df, test.function(df$a, df$b))
#> a b df$a_per_sd df$b_per_sd
#> 1 1 6 0.6324555 3.794733
#> 2 2 7 1.2649111 4.427189
#> 3 3 8 1.8973666 5.059644
#> 4 4 9 2.5298221 5.692100
#> 5 5 10 3.1622777 6.324555
Created on 2020-07-23 by the reprex package (v0.3.0)
The question is not completely clear to me, but you might get sd of rows/columns or vectors by these approaches:
apply(as.matrix(df), MARGIN = 1, FUN = sd) #across rows
#[1] 3.535534 3.535534 3.535534 3.535534 3.535534
apply(as.matrix(df), MARGIN = 2, FUN = sd) #across columns
# a b
#1.581139 1.581139
lapply(df, sd) #if you provide list of vectors (columns of `df` in this case)
#$a
#[1] 1.581139
#
#$b
#[1] 1.581139
I got this far. Is this what you are looking for?
test.function <- function(var)
{
newvar = paste(var, "_per_sd")
assign(newvar, var/sd(var))
get(newvar)
}
Input:
test.function(df$a)
Result:
[1] 0.6324555 1.2649111 1.8973666 2.5298221 3.1622777
I got the idea from here: Assignment using get() and paste()
At the end this is what my code looks like:
test.function <- function(...){
x <- list(...)
xn <- paste0(unlist(eval(substitute(alist(...)))),
"_per_sd")
setNames(lapply(x, function(y) y/sd(y, na.rm = TRUE)), xn)
}
test.function.wrap <- function(..., dataframe) {
assign(deparse(substitute(dataframe)), cbind(dataframe, test.function(...)) , envir=.GlobalEnv)
}
test.function.wrap(df$a, df$b , dataframe = df)
To be able to assign the new variables to the existing dataframe, I put the (absolutely genius) tips together and wrapped the function in another function to do the trick. I am aware it might not be as elegant, but it does the work!

Use paste0 to create multiple object names with a for loop

I would like to create multiple object names with a for loop. I have tried the following which fails horribly:
somevar_1 = c(1,2,3)
somevar_2 = c(4,5,6)
somevar_3 = c(7,8,9)
for (n in length(1:3)) {
x <- as.name(paste0("somevar_",[i]))
x[2]
}
The desired result is x being somevar_1, somevar_2, somevar_3 for the respective iterations, and x[2] being 2, 5 and 8 respectively.
How should I do this?
somevar_1 = c(1,2,3)
somevar_2 = c(4,5,6)
somevar_3 = c(7,8,9)
for (n in 1:3) {
x <- get(paste0("somevar_", n))
print(x[2])
}
Result
[1] 2
[1] 5
[1] 8
We can use mget to get all the required objects in a list and use sapply to subset 2nd element from each of them.
sapply(mget(paste0("somevar_", 1:3)), `[`, 2)
#somevar_1 somevar_2 somevar_3
# 2 5 8

How do I remove empty data frames from a list?

I've got dozens of lists, each is a collection of 11 data frames. Some data frames are empty (another script did not output any data, not a bug).
I need to push each list through a function but that chokes when it sees an empty data frame. So how do I write a function that will take a list, do a dim on each element (i.e. data frame) and if it's 0, then skip to the next.
I tried something like this:
empties <- function (mlist)
{
for(i in 1:length(mlist))
{
if(dim(mlist[[i]])[1]!=0) return (mlist[[i]])
}
}
But clearly, that didn't work. I would do this manually at this point but that would take forever. Help?
I'm not sure if this is exactly what you're asking for, but if you want to trim mlist down to contain only non-empty data frames before running the function on it, try mlist[sapply(mlist, function(x) dim(x)[1]) > 0].
E.g.:
R> M1 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
R> M2 <- data.frame(matrix(nrow = 0, ncol = 0))
R> M3 <- data.frame(matrix(9:12, nrow = 2, ncol = 2))
R> mlist <- list(M1, M2, M3)
R> mlist[sapply(mlist, function(x) dim(x)[1]) > 0]
[[1]]
X1 X2
1 1 3
2 2 4
[[2]]
X1 X2
1 9 11
2 10 12
A slightly simpler and more transparent approach to the sapply/indexing combination is to use the Filter() function:
> Filter(function(x) dim(x)[1] > 0, mlist)
[[1]]
X1 X2
1 1 3
2 2 4
[[2]]
X1 X2
1 9 11
2 10 12
Instead of dim(x)[1] you could make use of nrow, so you could do
mlist[sapply(mlist, nrow) > 0]
Filter(function(x) nrow(x) > 0, mlist)
You could also use keep and discard from purrr
purrr::keep(mlist, ~nrow(.) > 0)
purrr::discard(mlist, ~nrow(.) == 0)
There is also compact in purrr which removes all empty elements directly. It is a wrapper on discard
purrr::compact(mlist)
If you are ok to filter the list based on number of columns, you could replace nrow with ncol in above answers. Additionally, you could also use lengths to filter the list.
mlist[lengths(mlist) > 0]
Adding tidyverse option:
library(tidyverse)
mlist[map(mlist, function(x) dim(x)[1]) > 0]
mlist[map(mlist, ~dim(.)[1]) > 0]

Resources