I have a very large dataframe...
v.l.df <- data.frame(seq(0, 10, 0.0001),seq(0, 10, 0.0001),seq(0, 10, 0.0001))
...and a function with some if statements and calculations...
a.f <- function(cell_value,action){
if(action == 1){
cell_value * 1
}
else if(action == 2){
cell_value * 5
}
}
I now want to apply this function to the first two columns of my v.l.df row by row and build the sums of the returns. The new columns should thus contain (pseudo code):
new_col_1 new_col_2
a.f(v.l.df[1,1],1) + a.f(v.l.df[1,2],1) a.f(v.l.df[1,1],2) + a.f(v.l.df[1,2],2)
a.f(v.l.df[2,1],1) + a.f(v.l.df[2,2],1) a.f(v.l.df[2,1],2) + a.f(v.l.df[2,2],2)
...
How can this be achieved? I am struggeling with the multiple arguments when using apply and the sum of the returned values form the function.
EDIT: Changed the example function. Should now return the folowing
> a.f(2,1)
[1] 2
> a.f(2,2)
[1] 10
I'd do this in a couple of steps. You can reduce to fewer steps, but I prefer to keep it more readable:
First, apply a.f to all cells two times, using action=1 and action=2 to the first two columns of v.1.df (to pass aditional arguments inside apply, just put them after defining FUN):
action.1 = apply(v.1.df[,1:2], c(1,2), FUN = a.f, action=1)
action.2 = apply(v.1.df[,1:2] ,c(1,2), FUN = a.f, action=2)
Then ppply rowSums to both action.1 and action.2 and store the results in the same data.frame:
v.l.df$new.1 = rowSums(action.1) #or v.l.df$new.1 = apply(action.1,1,sum)
v.l.df$new.2 = rowSums(action.2) #or v.l.df$new.1 = apply(action.2,1,sum)
I believe your result is achieved by:
v.l.df$new_col_1 <- a.f(v.l.df$V1, 1) + a.f(v.l.df$V2, 1)
v.l.df$new_col_2 <- a.f(v.l.df$V1, 2) + a.f(v.l.df$V2, 2)
Assuming your first two columns are named V1 and V2 respectively.
You may also define another function
a.f.2 <- function(val1, val2, method) {
a.f(val1, method) + a.f(val2, method)
}
And apply it as follows
v.l.df$new_col_1 <- a.f.2(v.l.df$V1, v.l.df$V2, 1)
v.l.df$new_col_2 <- a.f.2(v.l.df$V1, v.l.df$V2, 2)
You can write this summary function with ... argument, to take an arbitrary number of inputs. The example below expects (and does not check for) columns of a data frame
a.f.n<- function(method,...){
rowSums(sapply(...,a.f,method))
}
Then apply this as follows:
v.l.df$new_col_1 <- a.f.n(v.l.df[,1:1000], method=1)
v.l.df$new_col_2 <- a.f.n(v.l.df[,1:1000], method=2)
I am not sure how efficient this will be, but it is compact. :-)
Related
I have a function in R that takes two vectors and an integer like this:
MyFunction<-function(MyLoot, DungeonLoot, DungeonOrder)
{
# Perfect Match
PerfectMatch = MyLoot[1]==DungeonLoot[1]
# Group Match
GroupandClassMatch = (MyLoot[2]==DungeonLoot[2])&(MyLoot[3]>=DungeonLoot[3])
# Order Match
OrderMatch = MyLoot[5]==DungeonOrder
# Order Match +1
OrderMatchPlusOne = (OrderMatch)&(MyLoot[8]==1)
# Final Score
Score = PerfectMatch*1 + GroupandClassMatch*1 + OrderMatch*2 + OrderMatchPlusOne*1
return(Score)
}
Now I want to apply MyFunction to two matrices Matrix1 and Matrix2 across their rows so that I have a vector that looks something like:
c(MyFunction(Matrix1[1,],Matrix2[1,],12),MyFunction(Matrix1[2,],Matrix2[2,],12),...,MyFunction[Matrix1[n,],Matrix2[n,],12)
What is the best (and most efficient) way of doing this? I could use a for loop but wondering if there is a better way e.g. using one of apply, sapply, mapply, lappy functions
Your newly posted function is still easily vectorized:
MyFunction <- function(MyLoot, DungeonLoot, DungeonOrder) {
# Perfect Match
PerfectMatch <- MyLoot[1,] == DungeonLoot[1,]
# Group Match
GroupandClassMatch <- (MyLoot[2,] == DungeonLoot[2,]) & (MyLoot[3,] >= DungeonLoot[3,])
# Order Match
OrderMatch <- MyLoot[5,] == DungeonOrder
# Order Match +1
OrderMatchPlusOne <- OrderMatch & (MyLoot[8,] == 1)
# Final Score
Score <- PerfectMatch + GroupandClassMatch + OrderMatch*2 + OrderMatchPlusOne
return(Score)
}
MyFunction(Matrix1, Matrix2, 12)
An example data along with expected output would have helped to better understand the question. However, I think you don't need any loop or apply functions to do this. If you have input as matrix then you should be able to do this directly.
Try,
rowSums(Matrix1 == Matrix2)
#For only specific columns
#rowSums(Matrix1[, 1:2] == Matrix2[, 1:2])
This achieved what I wanted - not sure if it is the most computationally efficient solution though.
t(mapply(function(x,y) MyFunction(x,y,11), split(MyLoot, col(MyLoot)), split(DungeonLoot, col(DungeonLoot))))
I have a function of this form:
foo<-function(x,y){
if(length(y)==1){
return(x*y)
}
else{
return(x-y[1]*y[2])
}
}
and for the y argument I pass either a number or a vector of numbers:
> #test function:
> foo(1,2)
[1] 2
> foo(1,c(1,2))
[1] -1
Now I wish to use mapply to this function, but I run into problems when I wish to pass a vector for the y argument:
df<-data.frame(
"a"<-floor(runif(6, 1,10)),
"b"<-floor(runif(6, 18,80)),
"c"<-floor(runif(6, 1,80)),
"d"<-floor(runif(6, 100,800)),
"e"<-floor(runif(6, 1000,4000)),
"f"<-floor(runif(6, 1,10)),
"g"<-floor(runif(6, 5,80))
)
names(df)=c("a","b","c","d","e","f","g")
The following works fine:
> mapply(FUN=foo,df["a"],df["b"])
,but I run into trouble when I try to do the following:
> mapply(FUN=foo,df["a"],cbind(df["b"],df["c"]))
I'm very grateful for tips on how to better use an argument that have verying length, or how to pass the argument to mapply!
There are a lot of possible fixes here. Fundamentally, you need to turn 2nd input into mapply into a list with two elements in each list. One way to achieve that is to do something like:
tmp <- as.data.frame(t(df[c('b', 'c')]))
result <- mapply(FUN=foo,df["a"], tmp)
since a data frame is a list. This is going to run the function on all combinations of df["a"] and tmp. The elements you want will be along the diagonal (1st element of df['a'] with the first element of tmp, so the final answer is
diag(result)
BTW, when you are inside a function such as data.frame, use = for assignment instead of <-. You also do not need the quotes around the letters (they are being ignored). so you're call to data.frame should look like
df<-data.frame(
a = floor(runif(6, 1,10)),
b = floor(runif(6, 18,80)),
c = floor(runif(6, 1,80)),
d = floor(runif(6, 100,800)),
e = floor(runif(6, 1000,4000)),
f = floor(runif(6, 1,10)),
g = floor(runif(6, 5,80))
)
Which allows you to avoid having to name the data frame after you define it.
Update without diagonal call
f1 <- function(x) {
if(length(x) ==2 ) x[1] * x[2]
else x[1] - x[2]*x[3]
}
apply(df[,c("a","b", "c")], 1, f1)
I want to multiply and then sum the unique pairs of a vector, excluding pairs made of the same element, such that for c(1:4):
(1*2) + (1*3) + (1*4) + (2*3) + (2*4) + (3*4) == 35
The following code works for the example above:
x <- c(1:4)
bar <- NULL
for( i in 1:length(x)) { bar <- c( bar, i * c((i+1) : length(x)))}
sum(bar[ 1 : (length(bar) - 2)])
However, my actual data is a vector of rational numbers, not integers, so the (i+1) portion of the loop will not work. Is there a way to look at the next element of the set after i, e.g. j, so that I could write i * c((j : length(x))?
I understand that for loops are usually not the most efficient approach, but I could not think of how to accomplish this via apply etc. Examples of that would be welcome, too. Thanks for your help.
An alternative to a loop would be to use combn and multiply the combinations using the FUN argument. Then sum the result:
sum(combn(x = 1:4, m = 2, FUN = function(x) x[1] * x[2]))
# [1] 35
Even better to use prod in FUN, as suggested by #bgoldst:
sum(combn(x = 1:4, m = 2, FUN = prod))
I have a list of filtering functions f1,f2,f3,f4,.... which take a matrix m and a number of options as input and return a subset of the rows of matrix as output. Now I would like to be able to define in an orderly way some meta-filtering function settings metaf1, metaf2, metaf3,... which would specify the sequential application of a specified nr of filtering functions, e.g. first f2 and then f3, using given options for each. I would like to store these filtering settings in a list of say class "metafiltering", and then have another function apply the filtering steps specified in a given metafiltering object. My idea would be able to in this way allow filtering settings to be stored and applied in an orderly way. How would I achieve this in the most elegant way in R? Or is there perhaps other convenient methods to achieve something like this?
EDIT: to give an example, say I have matrix
m=replicate(10, rnorm(20))
and filtering functions (these are just examples, obviously mine are more complicated :-) )
f1=function(m,opt1,opt2) {
return(m[(m[,2]>opt1)&(m[,1]>opt2),])
}
f2=function(m,opt1) {
return(m[(m[,3]>opt1),])
}
And I have defined the following metafiltering settings of specific class which would specify two functions which would have to be applied sequentially to matrix m
metafilterfuncs=list(fun1=f1(opt1=0.1,opt2=0.2),fun2=f2(opt1=0.5))
class("metafilterfuncs")="metafiltering"
The question I have then is how I could apply the filtering steps of an arbitrary metafiltering function object to given matrix m using the specified functions and settings?
You can do something like this :
You define a sort of functions pieplines where you give a priority for each function.
pipelines <- c(f1=100,f2=300,f3=200)
I define 3 dummy functions here for test:
f1 <- function(m,a) m + a
f2 <- function(m,b) m + b
f3 <- function(m,c) m + c
For each function , you store the argument in another list :
args <- list(f1=c(a=1),f2=c(b=2),f3=c(c=3))
Then you apply your functions :
m <- matrix(1:2,ncol=2)
for (func in names(pipelines[order(pipelines)]))
{
m <- do.call(func,list(m,args[[func]]))
}
pryr has a function, compose, like what you need, but it doesn't quite cut it. The compose function requires the functions to be given one by one, not in a list, and it cannot take arguments. It's also oddly placed in that package. A similar function can be found in plyr, namely each. But this function does not apply functions sequentially, but individually and outputs a named vector (list?).
agstudy provided a solution above, but it suffers from a problem: it can only take scalar arguments because it gives the arguments in a named vector. The solution to this is to use a named list instead. So, here's an improved function to replace the one in pryr.
compose2 = function(x, funcs, args, msg_intermediate = F) {
if (length(funcs) != length(args)) stop("length of functions and arguments must match")
for (i in seq_along(funcs)) {
x = do.call(what = funcs[[i]], args = c(x, args[[i]]))
if ((i != length(funcs)) && msg_intermediate) message(x)
}
x
}
msg_intermediate is a nice debugging argument that messages the intermediate results, so one can easier understand what happens.
Test it:
adder = function(x, n) x + n
compose2(0,
funcs = list(adder, adder, adder),
args = list(list(n = 1), list(n = 2), list(n = 3)),
msg_intermediate = T
)
Outputs:
1
3
[1] 6
This is what you get when you take 0, then add 1 (=1), then add 2 (=3), then add 3 (=6).
The args argument for compose2 takes a list of lists, so that one can supply non-scalar function arguments. Here's an example:
add_div = function(x, n, d) (x + n) / d
compose2(0,
funcs = list(add_div, add_div, add_div),
args = list(list(n = 1, d = 1), list(n = 2, d = 2), list(n = 3, d = 3)),
msg_intermediate = T
)
Output:
1
1.5
[1] 1.5
Which is what you get when you take 0, add 1, divide by 1 (=1), then take 1, add 2 then divide by 2 (=1.5), then take 1.5, add 3 and then divide by 3 (=1.5).
I'm working on subsets of data from multiple time periods and I'd like to do column and level reduction on my training set and then apply the same actions to other datasets of the same structure.
dataframeReduce in the Hmisc package is what I've been using, but applying the function to different dataset results in slightly different actions.
trainPredictors<-dataframeReduce(trainPredictors,
fracmiss=0.2, maxlevels=20, minprev=0.075)
testPredictors<-dataframeReduce(testPredictors,
fracmiss=0.2, maxlevels=20, minprev=0.075)
testPredictors<-testPredictors[,names(trainPredictors)]
The final line ends up erroring because the backPredictors has a column removed that trainPredictors does retains. All other sets should have the transformations applied to trainPredictors applied to them.
Does anyone know how to apply the same cleanup actions to multiple datasets either using dataframeReduce or another function/block of code?
An example
Using the function NAins from http://trinkerrstuff.wordpress.com/2012/05/02/function-to-generate-a-random-data-set/
NAins <- NAinsert <- function(df, prop = .1){
n <- nrow(df)
m <- ncol(df)
num.to.na <- ceiling(prop*n*m)
id <- sample(0:(m*n-1), num.to.na, replace = FALSE)
rows <- id %/% m + 1
cols <- id %% m + 1
sapply(seq(num.to.na), function(x){
df[rows[x], cols[x]] <<- NA
}
)
return(df)
}
library("Hmisc")
trainPredictors<-NAins(mtcars, .1)
testPredictors<-NAins(mtcars, .3)
trainPredictors<-dataframeReduce(trainPredictors,
fracmiss=0.2, maxlevels=20, minprev=0.075)
testPredictors<-dataframeReduce(testPredictors,
fracmiss=0.2, maxlevels=20, minprev=0.075)
testPredictors<-testPredictors[,names(trainPredictors)]
If your goal is to have the same variables with the same levels, then you need to avoid using dataframeReduce a second time, and instead use the same columns as produced by the dataframeReduce operation on hte train-set and apply factor reduction logic to the test-set in a manner that results in whatever degree of homology is needed of subsequent comparison operations. If it is a predict operation that is planned then you need to get the levels to be the same and you need to modify the code in dataframeReduce that works on the levels:
if (is.category(x) || length(unique(x)) == 2) {
tab <- table(x)
if ((min(tab)/n) < minprev) {
if (is.category(x)) {
x <- combine.levels(x, minlev = minprev)
s <- "grouped categories"
if (length(levels(x)) < 2)
s <- paste("prevalence<", minprev, sep = "")
}
else s <- paste("prevalence<", minprev, sep = "")
}
}
So a better problem statement is likely to produce a better strategy. This will probably require both knowing what levels are in the entire set and in the train and test sets as well as what testing or predictions are anticipated (but not yet stated).