lapply with condition? - r

I try to do a function which works with for loops with lapply instead. I am very new to R and not confortable with lapply. I especially don't get how to make an "if" condition.
My current code with for loops looks like that (it nomalizes volume series):
function(TableVolume,VolumeM,VolumeStD,n){
TableBN = TableVolume[n:nrow(TableVolume),]
for(k in 1:nrow(TableBN)){for (i in 2:ncol(TableBN)){if(!is.na(VolumeM[k,i]) && (VolumeM[k,i]) && (TableVolume[n-1+k,i]>VolumeM[k,i]+1.96/sqrt(n)*VolumeStD[k,i])){TableBN[k,i]=TableVolume[n-1+k,i]/VolumeM[k,i]}else{TableBN[k,i]=0}}}
TableBN=TableBN[n:nrow(TableVolume),]
return(TableBN)
}
I know from Apply over two data frames how to do a function that works with 2 data frame but I still don't see how to handle tests.
Thanks for your support,
Vincent

You need use lapply (or other apply family function). It is generally required when you have some non-vectorized function to apply to vectorized argument. Your function and conditions is a combinations of arithmetic functions, which are nicely vectorized. So you can use subsetting and ifelse function please see as below:
set.seed(123)
# simulation
TableBN <- matrix(1:12, nrow = 3)
VolumeM <- matrix(12:1, nrow = 3)
VolumeStD <- matrix(12:1, nrow = 3)
TableVolume <- matrix(abs(rnorm(10)), nrow = 5)
# function
f <- function(TableVolume, VolumeM, VolumeStD, n){
TableBN <- TableVolume[n:nrow(TableVolume), ]
ifelse(
test = !is.na(VolumeM) && VolumeM && TableBN[, -1] > (VolumeM + 1.96 / sqrt(n) * VolumeStD),
yes = TableBN[, -1] / VolumeM,
no = 0)
}
# Test
f(TableVolume, VolumeM, VolumeStD, 3)
Output:
0

Related

Saving recursive function results to a global data frame in R

I'm trying to recreate the functionality of the memoise package in base R by saving the outputs of a recursive function in a data frame. I have this function "P" and then I made this "metaP" wrapper that will run P(n) if metaP(n) hasn't been run before and then save the results of P(n), or it produces the previously saved output. My issue is it only works at the first level. If I run metaP(5) it will save the output of metaP(5), but in order to get P(5) it also had to calculate P(4) and the results of P(4) aren't getting saved. I'm assuming it's getting lost in the recursive environments, but when I tried using the assign function and setting it to the global environment it still didn't work.
In the example below, I run metaP 5 through 10, and df has 5 through 10 saved, but it doesn't have 1 through 5 saved, some of which must have been calculated to come up with the answers of 5 through 10.
df <- data.frame(n = 0, pn = 1)
metaP <- function(n) {
if (!n %in% df$n) df <<- rbind(df, data.frame(n = n, pn = P(n)))
df[df$n == n, "pn"]
}
P <- function(n) {
if (n < 0) return(0)
k <- rep(1:((sqrt(24 * n + 1) + 1) / 6), each = 2) * c(1, -1)
return(sum((-1) ^ (k + 1) * sapply(n - k * (3 * k - 1) / 2, metaP)) %% 1e6)
}
sapply(5:10, metaP)
df
The issue here is kind of subtle. The expression
df <<- rbind(df, data.frame(n = n, pn = P(n)))
is ambiguous, because the ?rbind documentation doesn't define the order in which the two arguments to rbind() are evaluated. It appears that R is evaluating df, then doing the recursive call, then appending that result to the saved value of df. Any changes to the global variable that happened during the recursive call are lost.
To fix this, rewrite the conditional part as
if (!n %in% df$n) {
newval <- data.frame(n = n, pn = P(n))
df <<- rbind(df, newval)
}
(I'd also suggest adding parens to the test, and writing it as if (!(n %in% df$n)), because it's not immediately obvious that these are the same. I was confused about this in an earlier answer to this question. But checking ?Syntax shows that %in% has higher priority than !.)

How do I turn a loop into a vector-based code?

I need to split a dataframe into N parts which shoud have a given length ("proglen").
I created a for-loop which gives me the desired result as a list of Dataframes.
Now I want to change the for-loop to a vector based code.
I'm not sure how to convert a loop into vector based code in R.
obslen = 600 ;proglen = 50 ; N = 50 ; testdat <- list()
for (i in 1:N){
test <- df[df$d >= df$d[obslen + i * proglen] &
df$d < df$d[proglen + obslen +i * proglen],]
testdat[[i]] <- test
}
The variable $d has the type Posixct.
The result should contain N Dataframes.
These dfs start at the day [obslen + i * proglen] and have the length "proglen".
This is almost similar to what you have done, just replaced by lapply and a little bit more readable. This will give you a list of data frames. This can also be done with recursive function but will be difficult to understand (in my view).
lapply(1:N, function(idx){
this_idx <- obslen + idx * proglen
next_idx <- obslen + (idx + 1) * proglen
df[this_idx:next_idx, ]
})
If the dates are distinct then you can always use the split function:
testdat <- split(df[obslen + seq(1, N * proglen), ], rep(seq(N), each = proglen))

Avoiding a loop when populating data frames in R

I have an empty data frame T_modelled with 2784 columns and 150 rows.
T_modelled <- data.frame(matrix(ncol = 2784, nrow = 150))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))
where
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
I filled T_modelled by column with a nested for loop, based on a formula:
for (i in 1:ncol(T_modelled)) {
col_tmp <- colnames(T_modelled)[i]
for (j in 1:nrow(T_modelled)) {
z_tmp <- z[j]-0.1
T_tmp <- MANSRT+As*e^(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
T_modelled[j ,col_tmp] <- T_tmp
}
}
where
MANSRT <- -2.051185
As <- 11.59375
omega <- (2*pi)/(347.875*24*60*60)
c <- 790
k <- 0.00219
pb <- 2600
K <- (k*1000)/(c*pb)
e <- exp(1)
I do get the desired results but I keep thinking there must be a more efficient way of filling that data frame. The loop is quite slow and looks cumbersome to me. I guess there is an opportunity to take advantage of R's vectorized way of calculating. I just cannot see myself how to incorporate the formula in an easier way to fill T_modelled.
Anyone got any ideas how to get the same result in a faster, more "R-like" manner?
I believe this does it.
Run this first instruction right after creating T_modelled, it will be needed to test that the results are equal.
Tm <- T_modelled
Now run your code then run the code below.
z_tmp <- z - 0.1
for (i in 1:ncol(Tm)) {
T_tmp <- MANSRT + As*exp(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
Tm[ , i] <- T_tmp
}
all.equal(T_modelled, Tm)
#[1] TRUE
You don't need the inner loop, that's the only difference.
(I also used exp directly but that is of secondary importance.)
Much like your previous question's solution which you accepted, consider simply using sapply, iterating through the vector, t_sec_ERT, which is the same length as your desired dataframe's number of columns. But first adjust every element of z by 0.1. Plus, there's no need to create empty dataframe beforehand.
z_adj <- z - 0.1
T_modelled2 <- data.frame(sapply(t_sec_ERT, function(ert)
MANSRT+As*e^(-z_adj*(omega/(2*K))^0.5)*sin(omega*ert-((omega/(2*K))^0.5)*z_adj)))
colnames(T_modelled2) <- paste0("t=", t_sec_ERT)
rownames(T_modelled2) <- paste0("z=", z)
all.equal(T_modelled, T_modelled2)
# [1] TRUE
Rui is of course correct, I just want to suggest a way of reasoning when writing a loop like this.
You have two numeric vectors. Functions for numerics in R are usually vectorized. By which I mean you can do stuff like this
x <- c(1, 6, 3)
sum(x)
not needing something like this
x_ <- 0
for (i in x) {
x_ <- i + x_
}
x_
That is, no need for looping in R. Of course looping takes place none the less, it just happens in the underlying C, Fortran etc. code, where it can be done more efficiently. This is usually what we mean when we call a function vectorized: looping takes place "under the hood" as it were. The output of Vectorize() thus isn't strictly vectorized by this definition.
When you have two numeric vectors you want to loop over you have to first see if the constituent functions are vectorized, usually by reading the docs.
If it is, you continue by constructing that central vectorized compound function and and start testing it with one vector and one scalar. In your case it would be something like this (testing with just the first element of t_sec_ERT).
z_tmp <- z - 0.1
i <- 1
T_tmp <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
Looks OK. Then you start looping over the elements of t_sec_ERT.
T_tmp <- matrix(nrow=length(z), ncol=length(t_sec_ERT))
for (i in 1:length(t_sec_ERT)) {
T_tmp[, i] <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
}
Or you can do it with sapply() which is often neater.
f <- function(x) {
MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*x - ((omega/(2*K))^0.5)*z_tmp)
}
T_tmp <- sapply(t_sec_ERT, f)
I would prefer to put the data in a long format, with all combinations of z and t_sec_ERT as two columns, in order to take advantage of vectorization. Although I usually prefer tidyr for switching between long and wide formats, I've tried to keep this as a base solution:
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
v <- expand.grid(t_sec_ERT, z)
names(v) <- c("t_sec_ERT", "z")
v$z_tmp <- v$z-0.1
v$T_tmp <- MANSRT+As*e^(-v$z_tmp*(omega/(2*K))^0.5)*sin(omega*v$t_sec_ERT-((omega/(2*K))^0.5)*v$z_tmp)
T_modelled <- data.frame(matrix(v$T_tmp, nrow = length(z), ncol = length(t_sec_ERT), byrow = TRUE))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))

R: Using Mapply to avoid loop

I have following code which I perform with for loop to generate a 3D array "newarr".
n1<-c(1,2,3,4,5)
n2<-c(3,4,5,6,7)
n3<- c(4,5,6,7,8)
afun <- function(y,p,q,r){
calc=1/(1+(((y-p)/q)^(2*r)))
return(calc)
}
newarr<- array(dim = c(4,5,5))
Amat<-matrix(data=c(1:10,NA,NA,NA,NA,15:20),nrow = 4,ncol = 5)
Bmat<-matrix(data=c(1:6,NA,NA,NA,NA,11:20),nrow = 4,ncol = 5)
Qmat<- +(!is.na(Amat) & !is.na(Bmat))
for(i in 1:5){
res<-afun(Amat,n1[i],n2[i],n3[i])
newarr[,,i]<- res
}
I want to use Mapply (or any apply function) instead of for loop.
arr2 <- array((mapply(function(x,y,n1,n2,n3) if(x==1) afun(y,n1,n2,n3) else 0,Qmat,Amat,n1,n2,n3)),c(4,5,5))
The following code seems to help but without condition as described further:
newarr <- array(mapply(afun, n1,n2,n3, MoreArgs = list(y=Amat)), c(4,5,5))
I want to include a condition by using Qmat (a 4x5 matrix with 0 and 1), so that when '0' is observed, no operation should be performed and return '0'value to fill matrix. For 1s in Qmat, perform 'afun' function and return the value to "newarr" to form a 3D matrix.
Thanks for the help!!
This will give you the desired result:
newarr <- array(mapply(afun, n1,n2,n3, MoreArgs = list(y=Amat)), c(4,5,5))
this is using the implicit SIMPLIFY=TRUE of mapply(). With SIMPLIFY=TRUE each result is reduced to a vector. (a matrix is a vector with a dimension attribute)
Another variant is
array(unlist(mapply(afun, n1,n2,n3, MoreArgs = list(y=Amat), SIMPLIFY = FALSE)), c(4,5,5))
here you will get a list of matrices as result from mapply(). You have to unlist.

How to sum over range and calculate a series in R?

Here is the formula which I am trying to calculate in R.
So far, this is my approach using a simplified example
t <- seq(1, 2, 0.1)
expk <- function(k){exp(-2*pi*1i*t*k)}
set.seed(123)
dat <- ts(rnorm(100), start = c(1994,3), frequency = 12)
arfit <- ar(dat, order = 4, aic = FALSE) # represent \phi in the formula
tmp1 <- numeric(4)
for (i in seq_along(arfit$ar)){
ek <- expk(i)
arphi <- arfit$ar[i]
tmp1[i] <- ek * arphi
}
tmp2 <- sum(tmp1)
denom = abs(1-tmp2)^2
s2 <- t/denom
Error : Warning message:
In tmp1[i] <- ek * arphi :
number of items to replace is not a multiple of replacement length
I was trying to avoid using for loop and tried using sapply as in solutions to this question.
denom2 <- abs(1- sapply(seq_along(arfit$ar), function(x)sum(arfit$ar[x]*expf(x))))^2
but doesnt seem to be correct. The problem is to do the sum of the series(over index k) when it is taking values from another vector as well, in this case, t which is in the numerator.
Any solutions ?
Any suggestion for a test dataset, maybe using 0 and 1 to check if the calculation is done correctly in this loop here ?
Typing up the answer determined in chat. Here's a solution involving vapply.
First correct expk to:
expk <- function(k){sum(exp(-2*pi*1i*t*k))}
Then you can create this function and vapply it:
myFun <- function(i) return(expk(i) * arfit$ar[i])
tmp2 <- sum(vapply(seq_along(arfit$ar), myFun, complex(1)))

Resources