How to apply a custom function to every value in a dataframe? - r

I am trying to apply a custom function to every value of a dataframe. Here is the custom function and dataframe:
#function
test_fun <- function(x, y = 1) {
output <- x + y
output
}
#dataframe
df <- data.frame(a = c(1,2,3), b = c(4,5,6))
Now lets say I want to apply test_fun, with y = 2, to every value of df. This method doesn't seem to work:
lapply(df, test_fun(y = 2))

The function is vectorized, we can directly apply over the dataset
test_fun(df, y = 2)
# a b
#1 3 6
#2 4 7
##3 5 8
Regarding the OP's error, if we are not using lambda function, specify the argument as
lapply(df, test_fun, y = 2)
-output
#$a
#[1] 3 4 5
#$b
#[1] 6 7 8
Or specify the lambda function and then use (y = 2)
lapply(df, function(vec) test_fun(vec, y = 2))

Related

Function in R to rename variables and compute R code

I would like to write a function which takes a list of variables out of a dataframe, say:
df <- data.frame(a = c(1,2,3,4,5), b = c(6,7,8,9,10))
And to compute always the same calculation, say calculate the standard deviation like:
test.function <- function(var){
for (i in var) {
paste0(i, "_per_sd") <- i / sd(i)
}
}
In order to create a new variable a_per_sd which is divided by it's standard deviation. Unfortunately, I am stuck and get a Error in paste0(i, "_per_sd") <- i/sd(i) : could not find function "paste0<-" error.
The expected usage should be:
test.function(df$a, df$b)
The expected result should be:
> df$a_per_sd
[1] 0.6324555 1.2649111 1.8973666 2.5298221 3.1622777
And for every other variable which was given.
Somehow I think I should use as.formula and/or eval, but I might be doing a thinking error.
Thank you very much for your attention and help.
Is this what you are after?
df <- data.frame(a = c(1,2,3,4,5), b = c(6,7,8,9,10))
test.function <- function(...){
x <- list(...)
xn <- paste0(unlist(eval(substitute(alist(...)))),
"_per_sd")
setNames(lapply(x, function(y) y/sd(y)), xn)
}
cbind(df, test.function(df$a, df$b))
#> a b df$a_per_sd df$b_per_sd
#> 1 1 6 0.6324555 3.794733
#> 2 2 7 1.2649111 4.427189
#> 3 3 8 1.8973666 5.059644
#> 4 4 9 2.5298221 5.692100
#> 5 5 10 3.1622777 6.324555
Created on 2020-07-23 by the reprex package (v0.3.0)
The question is not completely clear to me, but you might get sd of rows/columns or vectors by these approaches:
apply(as.matrix(df), MARGIN = 1, FUN = sd) #across rows
#[1] 3.535534 3.535534 3.535534 3.535534 3.535534
apply(as.matrix(df), MARGIN = 2, FUN = sd) #across columns
# a b
#1.581139 1.581139
lapply(df, sd) #if you provide list of vectors (columns of `df` in this case)
#$a
#[1] 1.581139
#
#$b
#[1] 1.581139
I got this far. Is this what you are looking for?
test.function <- function(var)
{
newvar = paste(var, "_per_sd")
assign(newvar, var/sd(var))
get(newvar)
}
Input:
test.function(df$a)
Result:
[1] 0.6324555 1.2649111 1.8973666 2.5298221 3.1622777
I got the idea from here: Assignment using get() and paste()
At the end this is what my code looks like:
test.function <- function(...){
x <- list(...)
xn <- paste0(unlist(eval(substitute(alist(...)))),
"_per_sd")
setNames(lapply(x, function(y) y/sd(y, na.rm = TRUE)), xn)
}
test.function.wrap <- function(..., dataframe) {
assign(deparse(substitute(dataframe)), cbind(dataframe, test.function(...)) , envir=.GlobalEnv)
}
test.function.wrap(df$a, df$b , dataframe = df)
To be able to assign the new variables to the existing dataframe, I put the (absolutely genius) tips together and wrapped the function in another function to do the trick. I am aware it might not be as elegant, but it does the work!

Use paste0 to create multiple object names with a for loop

I would like to create multiple object names with a for loop. I have tried the following which fails horribly:
somevar_1 = c(1,2,3)
somevar_2 = c(4,5,6)
somevar_3 = c(7,8,9)
for (n in length(1:3)) {
x <- as.name(paste0("somevar_",[i]))
x[2]
}
The desired result is x being somevar_1, somevar_2, somevar_3 for the respective iterations, and x[2] being 2, 5 and 8 respectively.
How should I do this?
somevar_1 = c(1,2,3)
somevar_2 = c(4,5,6)
somevar_3 = c(7,8,9)
for (n in 1:3) {
x <- get(paste0("somevar_", n))
print(x[2])
}
Result
[1] 2
[1] 5
[1] 8
We can use mget to get all the required objects in a list and use sapply to subset 2nd element from each of them.
sapply(mget(paste0("somevar_", 1:3)), `[`, 2)
#somevar_1 somevar_2 somevar_3
# 2 5 8

Trouble applying function to data frame

Toy example:
> myfn = function(a,x){sum(a*x)}
> myfn(a=2, x=c(1,2,3))
[1] 12
Good so far. Now:
> df = data.frame(a=c(4,5))
> df$ans = myfn(a=df$a, x=c(1,2,3))
Warning message:
In a * x : longer object length is not a multiple of shorter object length
> df
a ans
1 4 26
2 5 26
What I want to happen is that for the first row, it is as if I called myfn(a=4, x=c(1,2,3), giving an answer of 24, and for the second row, it is as if I called myfn(a=5, x=c(1,2,3) giving an answer of 30. How do I do this? Thank you.
EDIT: slightly more complex version. Now suppose that the function is
myfn = function(a,b, x){sum((a+b)*x)}
and that I have the data frame
df = data.frame(a=c(4,5), b=c(6,7), c=c(9,9))
I want to create df$ans such that, for the first row it is as if I called myfn(a=4, b=6, x=c(1,2,3) and for the second for it is as if I called myfn(a=5, b=7, x=c(1,2,3), that is, use df$x for a, df$y for b, and ignore df$z.
Something like this would work:
myfn = function(a,x){
return(sum(a*x))
}
df <- data.frame(a=c(4,5))
df$ans <- apply(df, 1, myfn, x = c(1,2,3))
df$ans
a ans
1 4 24
2 5 30
** Edited Based On User Edit **
df = data.frame(a=c(4,5), b=c(6,7), c=c(9,9))
df$ans <- apply(df[, c("a", "b")], 1, function(y) sum((y['a']+y['b'])*c(1,2,3)))
a b c ans
1 4 6 9 60
2 5 7 9 72
There are several ways this can be done, each with it's own charms. If you don't want to modify the function I would just do
mapply(myfn, df$x, df$y, MoreArgs = list(x = 1:3))
Alternatively, you can bake the iteration right into the function, e.g,
myfn = function(a,b, x){
sapply(a+b, function(ab) {
sum(ab*x)
})
}
myfn(df$x, df$y, 1:3)
That's probably the way I would do it.

R: How do I use a range stored in a list as part of a function in a for loop

I have the following code:
x = c(1,2,5,6)
x = rbind(x,c(4,3,6,5))
x = rbind(x,c(3,7,6,5))
y = 1:2
z = 3:4
variables = list(y,z)
p = 1
for(i in variables){
assign(paste("tbl",p,sep="")) = table(x[,i])
p = p + 1
}
tbl1
tbl2
How do I get the for loop to recognize i as 1:2 and 3:4 so that I get 2 tables returned from the loop named "tbl1" and "tbl2"?
Thanks!
I am not sure what output you expect, so I can just guess. First, I assumed that you meant to cbind, rather than rbind the vectors in x. Then I further assumed that the indices of x within your table-operation referred to the rows rather than the columns (otherwise the indices would be out of bounds).
Here is what you could do:
## Your data:
x <- cbind(c(1,2,5,6), c(4,3,6,5), c(3,7,6,5))
## Your indices
variables <- list(y = 1:2, z = 3:4)
## Your loop:
for (i in seq(along = variables)){
assign(paste0("tbl", i), table(x[variables[[i]],]))
}
tbl1
#
# 1 2 3 4 7
# 1 1 2 1 1
tbl2
#
# 5 6
# 3 3

R - Looping through datasets and change column names

I'm trying to loop through a bunch of datasets and change columns in R.
I have a bunch of datasets, say a,b,c,etc, and all of them have three columns, say X, Y, Z.
I would like to change their names to be a_X, a_Y, a_Z for dataset a, and b_X, b_Y, b_Z for dataset b, and so on.
Here's my code:
name.list = ("a","b","c")
for(i in name.list){
names(i) = c(paste(i,"_X",sep = ""),paste(i,"_Y",sep = ""),paste(i,"_Y",sep = ""));
}
However, the code above doesn't work since i is in text format.
I've considered assign function but doesn't seem to fit as well.
I would appreciate if any ideas.
Something like this :
list2env(lapply(mget(name.list),function(dat){
colnames(dat) <- paste(nn,colnames(dat),sep='_')
dat
}),.GlobalEnv)
for ( i in name.list) {
assign(i, setNames( get(i), paste(i, names(get(i)), sep="_")))
}
> a
a_X a_Y a_Z
1 1 3 A
2 2 4 B
> b
b_X b_Y b_Z
1 1 3 A
2 2 4 B
> c
c_X c_Y c_Z
1 1 3 A
2 2 4 B
Here's some free data:
a <- data.frame(X = 1, Y = 2, Z = 3)
b <- data.frame(X = 4, Y = 5, Z = 6)
c <- data.frame(X = 7, Y = 8, Z = 9)
And here's a method that uses mget and a custom function foo
name.list <- c("a", "b", "c")
foo <- function(x, i) setNames(x, paste(name.list[i], names(x), sep = "_"))
list2env(Map(foo, mget(name.list), seq_along(name.list)), .GlobalEnv)
a
# a_X a_Y a_Z
# 1 1 2 3
b
# b_X b_Y b_Z
# 1 4 5 6
c
# c_X c_Y c_Z
# 1 7 8 9
You could also avoid get or mget by putting a, b, and c into their own environment (or even a list). You also wouldn't need the name.list vector if you go this route, because it's the same as ls(e)
e <- new.env()
e$a <- a; e$b <- b; e$c <- c
bar <- function(x, y) setNames(x, paste(y, names(x), sep = "_"))
list2env(Map(bar, as.list(e), ls(e)), .GlobalEnv)
Another perk of doing it this way is that you still have the untouched data frames in the environment e. Nothing was overwritten (check a versus e$a).

Resources