I want to apply a function over a data frame. The function takes V1 as arg1 and V2 as arg2 and I want to write the result to V3 or some other vector.
Is there an easy and compact way to do this? I've posted a (non-working) example below.
Thanks
Stu
my.func <- function(X, Y) {
return(X + Y)
}
a <- c(1,2,3)
b <- c(4,5,6)
my.df <- data.frame(a, b)
apply(my.df, 1, my.func, X="a", Y="b")
mapply() is made for this.
Either of the following will do the job. The advantage of the second approach is that it scales nicely to functions that take an arbitrary number of arguments.
mapply(my.func, my.df[,1], my.df[,2])
# [1] 5 7 9
do.call(mapply, c(FUN=list(my.func), unname(my.df)))
# [1] 5 7 9
I feel this would be better approached using with than mapply if you're calling elements inside a data.frame:
with(my.df,my.func(X=a,Y=b))
#[1] 5 7 9
It's still quite a clean method even if you need to do the explicit conversion from a matrix:
with(data.frame(my.mat),my.func(X=a,Y=b))
#[1] 5 7 9
There isn't really any need for an *apply function here. Vectorization would suffice:
my.df$c <- my.df$a + my.df$b
# a b c
#1 1 4 5
#2 2 5 7
#3 3 6 9
Your apply solution can't work the way you have written it because apply does not pass a named vector through to your function: e.g.
colnames(my.df)
#[1] "a" "b"
apply( my.df , 1 , colnames )
#NULL
For your example, rowSums(my.df) will do the job. For more complicated tasks, you can use the mapply function. For example: mapply(my.func, my.df[a], my.df[b]).
Alternatively, you could rewrite your function to take a vector argument:
my.otherfunc <- function(x) sum(x)
apply(my.df, 1, my.otherfunc)
It's important to understand that when apply feeds each row or column into the function, it's sending one vector, not a list of separate entries. So you should give it a function with a single (vector) argument.
Related
After searching for some time, I cannot find a smooth R-esque solution.
I have a list of vectors that I want to convert to dataframes and add a column with the names of the vectors. I cant do this with cbind() and melt() to a single dataframe b/c there are vectors with different number of rows.
Basic example would be:
list<-list(a=c(1,2,3),b=c(4,5,6,7))
var<-"group"
What I have come up with and works is:
list<-lapply(list, function(x) data.frame(num=x,grp=""))
for (j in 1:length(list)){
list[[j]][,2]<-names(list[j])
names(list[[j]])[2]<-var
}
But I am trying to better use lapply() and have cleaner coding practices. Right now I rely so heavily on for and if statements, which a lot of the base functions do already and much more efficiently than I can code at this point.
The psuedo code I would like is something like:
list<-lapply(list, function(x) data.frame(num=x,get(var)=names(x))
Is there a clean way to get this done?
Second closely related question, if I already have a list of dataframes, why is it so hard to reassign column values and names using lapply()?
So using something like:
list<-list(a=data.frame(num=c(1,2,3),grp=""),b=data.frame(num=c(4,5,6,7),grp=""))
var<-"group"
#pseudo code
list<-lapply(list, function(x) x[,2]<-names(x)) #populate second col with name of df[x]
list<-lapply(list, function(x) names[[x]][2]<-var) #set 2nd col name to 'var'
The first line of pseudo code throws an error about matching row lengths. Why does lapply() not just loop over and repeat names(x) like the same function on a single dataframe does in a for loop?
For the second line, as I understand it I can use setNames() to reassign all the column names, but how do I make this work for just one of the col names?
Many thanks for any ideas or pointing to other threads that cover this and helping me understand the behavior of lapply() in this context.
A full R base approach without using loops
> l<-list(a=c(1,2,3),b=c(4,5,6,7))
> data.frame(grp=rep(names(l), lengths(l)), num=unlist(l), row.names = NULL)
grp num
1 a 1
2 a 2
3 a 3
4 b 4
5 b 5
6 b 6
Related to your first/main question you can use the function enframe from package tibble for this purpose
library(tibble)
library(tidyr)
library(dplyr)
l<-list(a=c(1,2,3),b=c(4,5,6,7))
l %>%
enframe(name = "group", value="value") %>%
unnest(value) %>%
group_split(group)
Try this:
library(dplyr)
mylist <- list(a = c(1,2,3), b = c(4,5,6,7))
bind_rows(lapply(names(mylist), function(x) tibble(grp = x, num = mylist[[x]])))
# A tibble: 7 x 2
grp num
<chr> <dbl>
1 a 1
2 a 2
3 a 3
4 b 4
5 b 5
6 b 6
7 b 7
This is essentially a lapply-based solution where you iterate over the names of your list, and not the individual list elements themselves. If you prefer to do everything in base R, note that the above is equivalent to
do.call(rbind, lapply(names(mylist), function(x) data.frame(grp = x, num = mylist[[x]], stringsAsFactors = F)))
Having said that, tibbles as modern implementation of data.frames are preferred, as is bind_rows over the do.call(rbind... construct.
As to the second question, note the following:
lapply(mylist, function(x) str(x))
num [1:3] 1 2 3
num [1:4] 4 5 6 7
....
lapply(mylist, function(x) names(x))
$a
NULL
$b
NULL
What you see here is that the function inside of lapply gets the elements of mylist. In this case, it get's to work with the numeric vector. This does not have any name as far as the function that is called inside lapply is concerned. To highlight this, consider the following:
names(c(1,2,3))
NULL
Which is the same: the vector c(1,2,3) does not have a name attribute.
For example, changing cumsum in the output of expr(cumsum(1:3)) to cumprod.
Currently the only thing I can think of is converting the output of expr(cumsum(1:3)) to a string, editing it, then changing it back to a function call.
This seems like a pretty poor solution though and I'm struggling to find a better way.
library(rlang)
f(expr(cumsum(1:4)), cumprod)
# [1] 1 2 6 24
This is basically what I'm trying to achieve. Can you help me find a starting point?
If you just apply gsub to expression R casts it to character vector and does the substitution which you can cast as expression with parse:
y <- 1:4
x <- expression({cumsum(y)})
x.2 <- gsub("cumsum", "cumprod", x)
class(x.2)
# [1] "character"
x.2 <- parse(text = x.2)
eval(x)
# [1] 1 3 6 10
eval(x.2)
# [1] 1 2 6 24
Here is an option using rlang
f <- function(ex, fn) {
ex1 <- as.character(ex)
fn <- enquo(fn)[-1]
eval_tidy(parse_expr(glue::glue('{fn}({ex1[-1]})')))
}
f(expr(cumsum(1:4)), cumprod)
#[1] 1 2 6 24
Note that if you replaced cumsum with cumprod the output would be a vector 4 long, not 24, so we assume you meant to replace it with prod.
We use substitute to substitute cumsum with the value of the cumsum argument and then evaluate the resulting expression.
f here uses no packages -- the input in the question uses expr from rlang but even that is not really needed since we could have used quote(...) in place of expr(...).
f <- function(.x, cumsum) eval.parent(do.call("substitute", list(.x)))
# test
f(expr(cumsum(1:4)), prod)
## [1] 24
f(expr(cumsum(1:4)), cumprod)
## [1] 1 2 6 24
I like #David Arenburg, so I'm posting his answer here and marking it.
It's not clear to me how do you decide which function you want replace (because : is also a function). But if you want to always replace the outer one, you could define the following
function f <- function(x, y) {
tmp <- substitute(x)
tmp[[1]] <- substitute(y)
eval(tmp)
}
and then use it as follows
f(cumsum(1:4), cumprod)
#[1] 1 2 6 24
– David Arenburg
How has a lapply function to be structured to pull out a specific objects by index? I have a List of Lists. I now want to get every even 2nd, 4th and 5th element of the list and put them into a data frame. I thought the easiest way would be to use lapply and simply get the entries like this:
list <-lapply(ll, function(x) { x[[2]]; x[[4]]; x[[5]] }
But that won't work as it seems.
this will work:
ll <- list(as.list(1:10),
as.list(11:20),
as.list(21:30))
library(magrittr)
output1 <- ll %>% sapply(function(x){c(x[[2]],x[[4]],x[[5]])}) %>% t %>% as.data.frame
# or with base syntax:
output2 <- as.data.frame(t(sapply(ll,function(x){c(x[[2]],x[[4]],x[[5]])})))
# V1 V2 V3
# 1 2 4 5
# 2 12 14 15
# 3 22 24 25
your function is returning the result of the last operation, which in your case is ``x[[5]]`. the 2 operations you made before are lost.
Not sure what you want this data.frame to look like, but you can extract the 2, 4, and 5 elements with
lapply(ll, `[`, c(2,4,5))
and if you wanted to turn those into rows, you could do
do.call("rbind",lapply(ll, `[`, c(2,4,5)))
If you wanted them to become columns, you could do
data.frame(sapply(ll, `[`, c(2,4,5)))
I have R objects:
"debt_30_06_2010" "debt_30_06_2011" "debt_30_06_2012" ...
and need to call them using a function:
paste0("debt_",date) ## "date" being another object
The problem is that when I assign the call to another object it takes only the name not the content:
debt_a <- paste0("endeud_", date1)
> debt_a
[1] "debt_30_06_2014"
I've tried to use the function "assign" without success:
assign("debt_a", paste0("debt_", date))
> debt_a
[1] "debt_30_06_2014"
I would like to know there is any method to achieve this task.
We could use get to get the value of the object. If there are multiple objects, use mget. For example, here I am assigning 'debt_a' with the value of 'debt_30_06_2010'
assign('debt_a', get(paste0('debt_', date[1])))
debt_a
#[1] 1 2 3 4 5
mget returns a list. So if we are assigning 'debt_a' to multiple objects,
assign('debt_a', mget(paste0('debt_', date)))
debt_a
#$debt_30_06_2010
#[1] 1 2 3 4 5
#$debt_30_06_2011
#[1] 6 7 8 9 10
data
debt_30_06_2010 <- 1:5
debt_30_06_2011 <- 6:10
date <- c('30_06_2010', '30_06_2011')
I'm not sure if I understood your question correctly, but I suspect that your objects are names of functions, and that you want to construct these names as characters to use the functions. If this is the case, this example might help:
myfun <- function(x){sin(x)**2}
mychar <- paste0("my", "fun")
eval(call(mychar, x = pi / 4))
#[1] 0.5
#> identical(eval(call(mychar, x = pi / 4)), myfun(pi / 4))
#[1] TRUE
I have this example data
library(quantmod)
getSymbols("NOK",from="2013-01-01",to="2014-05-01",src="yahoo","getSymbols.warning4.0"=FALSE)
data<-NOK
w1<-1
L_dO<-data[,1]
L_dC<-data[,4]
L_Profit_L_1<-((lag(L_dC,-1)-lag(L_dO,-1))/(lag(L_dO,-1)))*100
L_Profit_L_2<-((lag(L_dC,-2)-lag(L_dO,-1))/(lag(L_dO,-1)))*100
L_Profit_L_3<-((lag(L_dC,-3)-lag(L_dO,-1))/(lag(L_dO,-1)))*100
L_Profit_L_4<-((lag(L_dC,-4)-lag(L_dO,-1))/(lag(L_dO,-1)))*100
L_Profit_L_5<-((lag(L_dC,-5)-lag(L_dO,-1))/(lag(L_dO,-1)))*100
L_Profit_L_all<-ifelse(L_Profit_L_1>w1,L_Profit_L_1,
ifelse(L_Profit_L_2>w1,L_Profit_L_2,
ifelse(L_Profit_L_3>w1,L_Profit_L_3,
ifelse(L_Profit_L_4>w1,L_Profit_L_4,
ifelse(L_Profit_L_5>w1,L_Profit_L_5,L_Profit_L_5)))))
What am I interested in is L_Profit_L_all, but I see this is a bit strange and slow way to write it. I have tried to vectorize it like
L_Profit_L_all<-ifelse(c(L_Profit_L_1>w1,L_Profit_L_2>w1,L_Profit_L_3>w1,L_Profit_L_4>w1,L_Profit_L_5>w1),c(L_Profit_L_1,L_Profit_L_2,L_Profit_L_3,L_Profit_L_4,L_Profit_L_5),L_Profit_L_5)
But th result is not the same. I want it to work in right order, i.e. if the first if condition is TRUE, then return first else condition (and don't care about if another condition is TRUE which is able to do the first code)
Any straightforward how to achieve it? I have a huge dataset so every ms is good to save. Thanks
Here's an example how you might approach the problem without any ifelses. Assume you cbind all your L_Profit_L_X vectors together to get something similar to m in my example.
set.seed(1)
m <- matrix(sample(-5:5, 50, T), ncol = 5)
indx <- max.col(m > 1, ties.method = "first")
sapply(seq_along(indx), function(i) m[i, indx[i]])
#[1] 5 2 2 4 3 4 5 2 4 3
This is not fully vectorized since we're using sapply but I'm sure it will be a lot faster than the initial approach with 5 nested ifelses.
Update
You can vectorize the code by replacing the sapply part above with:
m[cbind(seq_len(nrow(m)), indx)]
# [1] 5 2 2 4 3 4 5 2 4 3