I'm trying to recreate the functionality of the memoise package in base R by saving the outputs of a recursive function in a data frame. I have this function "P" and then I made this "metaP" wrapper that will run P(n) if metaP(n) hasn't been run before and then save the results of P(n), or it produces the previously saved output. My issue is it only works at the first level. If I run metaP(5) it will save the output of metaP(5), but in order to get P(5) it also had to calculate P(4) and the results of P(4) aren't getting saved. I'm assuming it's getting lost in the recursive environments, but when I tried using the assign function and setting it to the global environment it still didn't work.
In the example below, I run metaP 5 through 10, and df has 5 through 10 saved, but it doesn't have 1 through 5 saved, some of which must have been calculated to come up with the answers of 5 through 10.
df <- data.frame(n = 0, pn = 1)
metaP <- function(n) {
if (!n %in% df$n) df <<- rbind(df, data.frame(n = n, pn = P(n)))
df[df$n == n, "pn"]
}
P <- function(n) {
if (n < 0) return(0)
k <- rep(1:((sqrt(24 * n + 1) + 1) / 6), each = 2) * c(1, -1)
return(sum((-1) ^ (k + 1) * sapply(n - k * (3 * k - 1) / 2, metaP)) %% 1e6)
}
sapply(5:10, metaP)
df
The issue here is kind of subtle. The expression
df <<- rbind(df, data.frame(n = n, pn = P(n)))
is ambiguous, because the ?rbind documentation doesn't define the order in which the two arguments to rbind() are evaluated. It appears that R is evaluating df, then doing the recursive call, then appending that result to the saved value of df. Any changes to the global variable that happened during the recursive call are lost.
To fix this, rewrite the conditional part as
if (!n %in% df$n) {
newval <- data.frame(n = n, pn = P(n))
df <<- rbind(df, newval)
}
(I'd also suggest adding parens to the test, and writing it as if (!(n %in% df$n)), because it's not immediately obvious that these are the same. I was confused about this in an earlier answer to this question. But checking ?Syntax shows that %in% has higher priority than !.)
Related
I can compute a recursive variable with a for loop like this:
df <- as.data.frame(cbind(1:10))
df$it <- NA
for(i in 1:length(df$V1)) {
df$it <- 0.5*df$V1 + dplyr::lag(df$it, default = 0)
}
df
But how can I do this with a function on the fly?
The following produces an error "C stack usage 15924032 is too close to the limit":
adstwm <- function(x){
0.5*x + adstwm(dplyr::lag(x, default = 0))
}
adstwm(df$V1)
I guess I need to define a stop for the process, but how?
You can use the cumulative sum to achieve the desired result. This sums all preceding values in a vector which is effectively the same as your recursive loop:
df$it_2 <- cumsum(0.5*df$V1)
If you do want to make recursive function, for example to address the comment below you can include an if statement that will make it stop:
function_it <- function(vector, length) {
if (length == 1) 0.5*vector[length]
else 0.5*vector[length] + 0.1*function_it(vector, length-1)
}
df$it <- NA
for(i in 1:length(df$V1)) {
df$it[i] <- function_it(df$V1,i)
}
df
However, you still need the for loop since the function is not vectorised so not sure if it really helps.
I can put the lagged for loop into the function without recursing the function:
df <- as.data.frame(cbind(1:10))
func.it2 <- function(x){
df$it2 <- NA
for(i in 1:length(df$V1)) {
df$it2 <- 0.5*df$V1 + 0.1*dplyr::lag(df$it2, default = 0)
}
df$it2
}
func.it2(df$V1)
df
df$it2 <- func.it2(df$V1)
df
That works. But why is df$it2 not available as a variable in df until I declare it (second last line) although it was already available in the for loop (line 5)?
I have an empty data frame T_modelled with 2784 columns and 150 rows.
T_modelled <- data.frame(matrix(ncol = 2784, nrow = 150))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))
where
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
I filled T_modelled by column with a nested for loop, based on a formula:
for (i in 1:ncol(T_modelled)) {
col_tmp <- colnames(T_modelled)[i]
for (j in 1:nrow(T_modelled)) {
z_tmp <- z[j]-0.1
T_tmp <- MANSRT+As*e^(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
T_modelled[j ,col_tmp] <- T_tmp
}
}
where
MANSRT <- -2.051185
As <- 11.59375
omega <- (2*pi)/(347.875*24*60*60)
c <- 790
k <- 0.00219
pb <- 2600
K <- (k*1000)/(c*pb)
e <- exp(1)
I do get the desired results but I keep thinking there must be a more efficient way of filling that data frame. The loop is quite slow and looks cumbersome to me. I guess there is an opportunity to take advantage of R's vectorized way of calculating. I just cannot see myself how to incorporate the formula in an easier way to fill T_modelled.
Anyone got any ideas how to get the same result in a faster, more "R-like" manner?
I believe this does it.
Run this first instruction right after creating T_modelled, it will be needed to test that the results are equal.
Tm <- T_modelled
Now run your code then run the code below.
z_tmp <- z - 0.1
for (i in 1:ncol(Tm)) {
T_tmp <- MANSRT + As*exp(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
Tm[ , i] <- T_tmp
}
all.equal(T_modelled, Tm)
#[1] TRUE
You don't need the inner loop, that's the only difference.
(I also used exp directly but that is of secondary importance.)
Much like your previous question's solution which you accepted, consider simply using sapply, iterating through the vector, t_sec_ERT, which is the same length as your desired dataframe's number of columns. But first adjust every element of z by 0.1. Plus, there's no need to create empty dataframe beforehand.
z_adj <- z - 0.1
T_modelled2 <- data.frame(sapply(t_sec_ERT, function(ert)
MANSRT+As*e^(-z_adj*(omega/(2*K))^0.5)*sin(omega*ert-((omega/(2*K))^0.5)*z_adj)))
colnames(T_modelled2) <- paste0("t=", t_sec_ERT)
rownames(T_modelled2) <- paste0("z=", z)
all.equal(T_modelled, T_modelled2)
# [1] TRUE
Rui is of course correct, I just want to suggest a way of reasoning when writing a loop like this.
You have two numeric vectors. Functions for numerics in R are usually vectorized. By which I mean you can do stuff like this
x <- c(1, 6, 3)
sum(x)
not needing something like this
x_ <- 0
for (i in x) {
x_ <- i + x_
}
x_
That is, no need for looping in R. Of course looping takes place none the less, it just happens in the underlying C, Fortran etc. code, where it can be done more efficiently. This is usually what we mean when we call a function vectorized: looping takes place "under the hood" as it were. The output of Vectorize() thus isn't strictly vectorized by this definition.
When you have two numeric vectors you want to loop over you have to first see if the constituent functions are vectorized, usually by reading the docs.
If it is, you continue by constructing that central vectorized compound function and and start testing it with one vector and one scalar. In your case it would be something like this (testing with just the first element of t_sec_ERT).
z_tmp <- z - 0.1
i <- 1
T_tmp <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
Looks OK. Then you start looping over the elements of t_sec_ERT.
T_tmp <- matrix(nrow=length(z), ncol=length(t_sec_ERT))
for (i in 1:length(t_sec_ERT)) {
T_tmp[, i] <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
}
Or you can do it with sapply() which is often neater.
f <- function(x) {
MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*x - ((omega/(2*K))^0.5)*z_tmp)
}
T_tmp <- sapply(t_sec_ERT, f)
I would prefer to put the data in a long format, with all combinations of z and t_sec_ERT as two columns, in order to take advantage of vectorization. Although I usually prefer tidyr for switching between long and wide formats, I've tried to keep this as a base solution:
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
v <- expand.grid(t_sec_ERT, z)
names(v) <- c("t_sec_ERT", "z")
v$z_tmp <- v$z-0.1
v$T_tmp <- MANSRT+As*e^(-v$z_tmp*(omega/(2*K))^0.5)*sin(omega*v$t_sec_ERT-((omega/(2*K))^0.5)*v$z_tmp)
T_modelled <- data.frame(matrix(v$T_tmp, nrow = length(z), ncol = length(t_sec_ERT), byrow = TRUE))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))
Somewhat new to R (coming from SQL), trying to write a quick loop to generate a series of functions that perform a task a specified number of times (i.e. for function 2, do something 2 times, function 3 -> 3 times, etc.).
My issue is arising in that I'm using the iterative variable (in the below code, 'k') as a part of the lower-level function - as such, when I go to evaluate ANY of the subsequently generated functions, it returns the value of the function as of the last value of k (here, 4), no matter if I call function 2, 3, or 4.
My question is therefore how can I substitute the value of 'k' for the variable 'k' when I'm generating the lower level functions? For example, on the first iteration, when k = 2, I want to substitute '2' for every occurrence of 'k' in the lower level function, such that when function 2 is run later, it sees the value of '2', rather than the last value of 'k'?
Code below (note that the 'x' value the function will evaluate is a 1 column, variable row matrix):
x <- as.matrix(11:20)
for (k in 2:4) {
actvfun <- NULL
actvfun <- function(x) {
actv <- NULL
actvmtx <- NULL
actvmtx <- as.matrix(x)
for (j in 2:(k+1)) {
actv <- rep(NA, length(x))
for (i in j:length(x)) {
actv[i] <- x[i - (j - 1)]
}
actvmtx <- as.matrix(cbind(actvmtx, actv))
}
assign(paste("lag0", k, "av", sep=""), actvmtx)
return(apply(get(paste("lag0", k, "av", sep="")), 1, mean, na.rm=T))
assign(paste("LAGTEST0", k, "AV", sep=""), apply(actvmtx, 1, mean, na.rm=T))
}
assign(paste("v5LAGTEST0", k, ".av", sep=""), actvfun)
}
v5LAGTEST02.av(x)
v5LAGTEST03.av(x)
The last two items are the checks I was running - currently both return the result using k = 4, rather than their respective values of 2 and 3.
Any help is greatly appreciated - I know loops are somewhat frowned upon in R (as is 'assign', but I'm not sure how else to achieve the desired result of variable function names), so I'm certainly open to new suggestions!
Thanks,
Nate
I have a large dataframe, alldata, for which I'm attempting to do a number of calculations that were previously done on an older version and written in base R. My goal is to create new columns with the outputs of these calculations using dplyr. The previous version of this code used dozens of intermediate dataframes and wrote these calculations out into separate files using functions.
I am curious if it is possible to preserve these functions as they are written, but embed them in dplyr to reference columns in alldata instead of these temporary matrices from the original version.
Here is a sample section of the code I'm working on. As you can see in the comments, I've translated the old R code into dplyr for the simple weighted mean function.
d_weighted = alldata %>%
# equivalent to by = list(regspp = data$regspp[inds]) from old code
group_by(regspp, year) %>%
# equivalent to wgtmean = function(x, na.rm=FALSE) wtd.mean(x=x[,1], weights=x[,2], na.rm=na.rm) from old code
mutate(lat_wgtmean = wtd.mean(x=lat, weights=wtcpue, na.rm=FALSE))
However, as the functions get more complicated than wgtmean, I'd like to just include the function as is.
The next function from the old code calculates a weighted standard deviation from a matrix where the first column consists of values and the second of weights:
wgtsd = function(mat, ...){
x = mat[,1][mat[,2]>0] # trim to values with weight > 0
w = mat[,2][mat[,2]>0]
sqrt(wtd.var(x=x, weights=w, ...))
}
Is it possible to embed this function in dplyr::mutate with value = lat (latitude) and weight = wtcpue (catch per unit effort, converted to weights) to create a new column in alldata that contains weighted standard deviations?
I recognize I could rewrite these functions, but I'd rather not do so for the more complex ones later in the text (see the below one as an example), and I'm curious if there is an elegant solution to integrate functions with matrix arguments with dplyr.
wgtskew = function(mat, na.rm=FALSE){
x = mat[,1][mat[,2]>0] # trim to values with weight > 0
w = mat[,2][mat[,2]>0]
if(na.rm){
s = !is.na(x+w)
x = x[s]
w = w[s]
}
n = length(x)
w = n * w / sum(w) # normalize
if(n>2){
c3 = n / ((n - 1) * (n - 2))
sdv = wgtsd(cbind(x, w), normwt = TRUE, na.rm = na.rm)
xbar = wtd.mean(x, w, na.rm = na.rm)
sk = c3 * sum(w ^ (3 / 2) * ((x - xbar) / sdv) ^ 3)
return(sk)
} else {
return(NA)
}
}
The packages matrixStats and Weighted.Desc.Stat contain many of the functions you might need.
Then you can either Find and Replace using your text editor, or, for example:
wgtsd <- function(...) matrixStats::weightedSd(...)
and run your script as before.
I try to do a function which works with for loops with lapply instead. I am very new to R and not confortable with lapply. I especially don't get how to make an "if" condition.
My current code with for loops looks like that (it nomalizes volume series):
function(TableVolume,VolumeM,VolumeStD,n){
TableBN = TableVolume[n:nrow(TableVolume),]
for(k in 1:nrow(TableBN)){for (i in 2:ncol(TableBN)){if(!is.na(VolumeM[k,i]) && (VolumeM[k,i]) && (TableVolume[n-1+k,i]>VolumeM[k,i]+1.96/sqrt(n)*VolumeStD[k,i])){TableBN[k,i]=TableVolume[n-1+k,i]/VolumeM[k,i]}else{TableBN[k,i]=0}}}
TableBN=TableBN[n:nrow(TableVolume),]
return(TableBN)
}
I know from Apply over two data frames how to do a function that works with 2 data frame but I still don't see how to handle tests.
Thanks for your support,
Vincent
You need use lapply (or other apply family function). It is generally required when you have some non-vectorized function to apply to vectorized argument. Your function and conditions is a combinations of arithmetic functions, which are nicely vectorized. So you can use subsetting and ifelse function please see as below:
set.seed(123)
# simulation
TableBN <- matrix(1:12, nrow = 3)
VolumeM <- matrix(12:1, nrow = 3)
VolumeStD <- matrix(12:1, nrow = 3)
TableVolume <- matrix(abs(rnorm(10)), nrow = 5)
# function
f <- function(TableVolume, VolumeM, VolumeStD, n){
TableBN <- TableVolume[n:nrow(TableVolume), ]
ifelse(
test = !is.na(VolumeM) && VolumeM && TableBN[, -1] > (VolumeM + 1.96 / sqrt(n) * VolumeStD),
yes = TableBN[, -1] / VolumeM,
no = 0)
}
# Test
f(TableVolume, VolumeM, VolumeStD, 3)
Output:
0