lapply for a function that requires indexing - r

I'm trying to find the log return along a vector of prices but not sure how to call an index inside a function for use in an apply function.
Here's what I'm using now:
set.seed(456)
df1 <- data.frame(id = 1:20, col1 = round( runif(20) * 100 ,0))
df1[,'logDiff'] <- NA
for(i in 2:20){
df1[i,'logDiff'] <- log(df1[i,'col1'] / df1[i-1,'col1'])
}
Any suggestions?
EDIT:
I have a bunch of columns to do this for and would like to use something like this:
colsToUse <- c('co1l','col2','col3')
lagLogDf <- as.data.frame(lapply(df1[,colsToUse], lagLogFunction(x)))

As you want the difference between consecutive values of a vector, you can use the diff function:
df1$logDiff = c(NA, diff(log(df1$col)))
Alternatively (for instance, if your operation were more complicated than cumulative differences), you could use head and tail to get the vector missing the first element and missing the last element, and work with them in a vectorized way:
df1$logDiff = c(NA, log(tail(df1$col1, -1) / head(df1$col1, -1)))

Related

Apply a function that takes two vectors as input to two matrices across their corresponding columns (in R)

I have a function in R that takes two vectors and an integer like this:
MyFunction<-function(MyLoot, DungeonLoot, DungeonOrder)
{
# Perfect Match
PerfectMatch = MyLoot[1]==DungeonLoot[1]
# Group Match
GroupandClassMatch = (MyLoot[2]==DungeonLoot[2])&(MyLoot[3]>=DungeonLoot[3])
# Order Match
OrderMatch = MyLoot[5]==DungeonOrder
# Order Match +1
OrderMatchPlusOne = (OrderMatch)&(MyLoot[8]==1)
# Final Score
Score = PerfectMatch*1 + GroupandClassMatch*1 + OrderMatch*2 + OrderMatchPlusOne*1
return(Score)
}
Now I want to apply MyFunction to two matrices Matrix1 and Matrix2 across their rows so that I have a vector that looks something like:
c(MyFunction(Matrix1[1,],Matrix2[1,],12),MyFunction(Matrix1[2,],Matrix2[2,],12),...,MyFunction[Matrix1[n,],Matrix2[n,],12)
What is the best (and most efficient) way of doing this? I could use a for loop but wondering if there is a better way e.g. using one of apply, sapply, mapply, lappy functions
Your newly posted function is still easily vectorized:
MyFunction <- function(MyLoot, DungeonLoot, DungeonOrder) {
# Perfect Match
PerfectMatch <- MyLoot[1,] == DungeonLoot[1,]
# Group Match
GroupandClassMatch <- (MyLoot[2,] == DungeonLoot[2,]) & (MyLoot[3,] >= DungeonLoot[3,])
# Order Match
OrderMatch <- MyLoot[5,] == DungeonOrder
# Order Match +1
OrderMatchPlusOne <- OrderMatch & (MyLoot[8,] == 1)
# Final Score
Score <- PerfectMatch + GroupandClassMatch + OrderMatch*2 + OrderMatchPlusOne
return(Score)
}
MyFunction(Matrix1, Matrix2, 12)
An example data along with expected output would have helped to better understand the question. However, I think you don't need any loop or apply functions to do this. If you have input as matrix then you should be able to do this directly.
Try,
rowSums(Matrix1 == Matrix2)
#For only specific columns
#rowSums(Matrix1[, 1:2] == Matrix2[, 1:2])
This achieved what I wanted - not sure if it is the most computationally efficient solution though.
t(mapply(function(x,y) MyFunction(x,y,11), split(MyLoot, col(MyLoot)), split(DungeonLoot, col(DungeonLoot))))

R Convert loop into function

I would like to clean up my code a bit and start to use more functions for my everyday computations (where I would normally use for loops). I have an example of a for loop that I would like to make into a function. The problem I am having is in how to step through the constraint vectors without a loop. Here's what I mean;
## represents spectral data
set.seed(11)
df <- data.frame(Sample = 1:100, replicate(1000, sample(0:1000, 100, rep = TRUE)))
## feature ranges by column number
frm <- c(438,563,953,963)
to <- c(548,803,1000,993)
nm <- c("WL890", "WL1080", "WL1400", "WL1375")
WL.ps <- list()
for (i in 1:length(frm)){
## finds the minimum value within the range constraints and returns the corresponding column name
WL <- colnames(df[frm[i]:to[i]])[apply(df[frm[i]:to[i]],1,which.min)]
WL.ps[[i]] <- WL
}
new.df <- data.frame(WL.ps)
colnames(new.df) <- nm
The part where I iterate through the 'frm' and 'to' vector values is what I'm having trouble with. How does one go from frm[1] to frm[2].. so-on in a function (apply or otherwise)?
Any advice would be greatly appreciated.
Thank you.
You could write a function which returns column name of minimum value in each row for a particular range of columns. I have used max.col instead of apply(df, 1, which.min) to get minimum value in a row since max.col would be efficient compared to apply.
apply_fun <- function(data, x, y) {
cols <- x:y
names(data[cols])[max.col(-data[cols])]
}
Apply this function using Map :
WL.ps <- Map(apply_fun, frm, to, MoreArgs = list(data = df))

How to use lapply to find closest value in a list in R?

I'm trying to find the model-predicted value closest to a real observed value within a large dataframe. I believe I need to use lapply, but I'm really not sure. Thanks in advance, SE, and sorry if this is a repeat of a previous post, I looked.
df <- data.frame(pred = rnorm(50, mean = 100, sd = 10),
cand = I(replicate(50, exp = I(list(rnorm(6, mean = 100, sd = 10))))))
So far, I've come up with a 1-line function that works when run on a single row, but I have two problems:
df$closest <- sapply( df, function(x) { which.min( abs( df$pred[x] - df$cand[[x]] ) ) } )
This function won't work on the full list, probably because I am new to the apply family.
This function returns a list position, not the actual value, which is what I need.
Error in df$cand[[x]] : no such index at level 1
apply allows us to operate on the rows, or the columns, because you are looking to loop through the rows, a margin of 1 (rows) should get the job done!
We could use apply:
df$closest <- apply( df,MARGIN = 1, function(x) { which.min( abs( x$pred - x$cand ) ) } )
Here, we can use Map instead of sapply because sapply loops over each of the columns and the x anonymous function value is the value of that column. It cannot be used for indexing
df$closest <- unlist(Map(function(x, y) which.min(abs(y - x)), df$pred, df$cand))
Or else with sapply, we have to loop over the row index

How to substitute negative values with a calculated value in an entire dataframe

I've got a huge dataframe with many negative values in different columns that should be equal to their original value*0.5.
I've tried to apply many R functions but it seems I can't find a single function to work for the entire dataframe.
I would like something like the following (not working) piece of code:
mydf[] <- replace(mydf[], mydf[] < 0, mydf[]*0.5)
You can simply do,
mydf[mydf<0] <- mydf[mydf<0] * 0.5
If you have values that are non-numeric, then you may want to apply this to only the numeric ones,
ind <- sapply(mydf, is.numeric)
mydf1 <- mydf[ind]
mydf1[mydf1<0] <- mydf1[mydf1<0] * 0.5
mydf[ind] <- mydf1
You could try using lapply() on the entire data frame, making the replacements on each column in succession.
df <- lapply(df, function(x) {
x <- ifelse(x < 0, x*0.5, x)
})
The lapply(), or list apply, function is intended to be used on lists, but data frames are a special type of list so this works here.
Demo
In the replace the values argument should be of the same length as the number of TRUE values in the list ('index' vector)
replace(mydf, mydf <0, mydf[mydf <0]*0.5)
Or another option is set from data.table, which would be very efficient
library(data.table)
for(j in seq_along(mydf)){
i1 <- mydf[[j]] < 0
set(mydf, i = which(i1), j= j, value = mydf[[j]][i1]*0.5)
}
data
set.seed(24)
mydf <- as.data.frame(matrix(rnorm(25), 5, 5))

Use an apply function to a subset of rows in a data frame - vectorised solution

I am trying to implement a vectorised solution that iterates through each row of a data frame during each iteration, sends a function two arguments. These arguments correspond to columns in the data frame.
Here is some code which hopefully makes this question clearer.
fnATimesB <- function(a, b) {
return(a * b)
}
vct.names <- c("mark", "fred", "ben")
vct.days <- c(1, 3, 5)
vct.salary <- c(1000, 4000, 5000)
df.data <- data.frame(name = vct.names, days = vct.days, sal = vct.salary)
# want to use something like the following:
sapply(df.data, fnATimesB, days , sal)
# expected result
# 1000
# 12000
# 250000
All other solutions assuming the function called is vectorized, here's another if it's not the case:
sapply( 1:nrow(df.data), function(x) {
fnATimesB( df.data[x,'days'], df.data[x,'sal'] )
} )
Alternatively, you can use apply here and avoid the anonymous function call, while slightly modifying your original function instead. The only thing to remember is that apply converts the data set to a matrix and thus, you shouldn't have non-numeric columns in the input data, here is an example
fnATimesB <- function(df, a, b) {
df[a] * df[b]
}
apply(df.data[-1L], 1L, fnATimesB, a = 'days', b = 'sal')
## [1] 1000 12000 25000
simply write:
with(df.data, vct.days*vct.salary)
with( df.data, fnATimesB(days,sal))
will give a vector

Resources