Avoiding a loop when populating data frames in R

Avoiding a loop when populating data frames in R - r

I have an empty data frame T_modelled with 2784 columns and 150 rows.
T_modelled <- data.frame(matrix(ncol = 2784, nrow = 150))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))
where
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
I filled T_modelled by column with a nested for loop, based on a formula:
for (i in 1:ncol(T_modelled)) {
col_tmp <- colnames(T_modelled)[i]
for (j in 1:nrow(T_modelled)) {
z_tmp <- z[j]-0.1
T_tmp <- MANSRT+As*e^(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
T_modelled[j ,col_tmp] <- T_tmp
}
}
where
MANSRT <- -2.051185
As <- 11.59375
omega <- (2*pi)/(347.875*24*60*60)
c <- 790
k <- 0.00219
pb <- 2600
K <- (k*1000)/(c*pb)
e <- exp(1)
I do get the desired results but I keep thinking there must be a more efficient way of filling that data frame. The loop is quite slow and looks cumbersome to me. I guess there is an opportunity to take advantage of R's vectorized way of calculating. I just cannot see myself how to incorporate the formula in an easier way to fill T_modelled.
Anyone got any ideas how to get the same result in a faster, more "R-like" manner?

I believe this does it.
Run this first instruction right after creating T_modelled, it will be needed to test that the results are equal.
Tm <- T_modelled
Now run your code then run the code below.
z_tmp <- z - 0.1
for (i in 1:ncol(Tm)) {
T_tmp <- MANSRT + As*exp(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
Tm[ , i] <- T_tmp
}
all.equal(T_modelled, Tm)
#[1] TRUE
You don't need the inner loop, that's the only difference.
(I also used exp directly but that is of secondary importance.)

Much like your previous question's solution which you accepted, consider simply using sapply, iterating through the vector, t_sec_ERT, which is the same length as your desired dataframe's number of columns. But first adjust every element of z by 0.1. Plus, there's no need to create empty dataframe beforehand.
z_adj <- z - 0.1
T_modelled2 <- data.frame(sapply(t_sec_ERT, function(ert)
MANSRT+As*e^(-z_adj*(omega/(2*K))^0.5)*sin(omega*ert-((omega/(2*K))^0.5)*z_adj)))
colnames(T_modelled2) <- paste0("t=", t_sec_ERT)
rownames(T_modelled2) <- paste0("z=", z)
all.equal(T_modelled, T_modelled2)
# [1] TRUE

Rui is of course correct, I just want to suggest a way of reasoning when writing a loop like this.
You have two numeric vectors. Functions for numerics in R are usually vectorized. By which I mean you can do stuff like this
x <- c(1, 6, 3)
sum(x)
not needing something like this
x_ <- 0
for (i in x) {
x_ <- i + x_
}
x_
That is, no need for looping in R. Of course looping takes place none the less, it just happens in the underlying C, Fortran etc. code, where it can be done more efficiently. This is usually what we mean when we call a function vectorized: looping takes place "under the hood" as it were. The output of Vectorize() thus isn't strictly vectorized by this definition.
When you have two numeric vectors you want to loop over you have to first see if the constituent functions are vectorized, usually by reading the docs.
If it is, you continue by constructing that central vectorized compound function and and start testing it with one vector and one scalar. In your case it would be something like this (testing with just the first element of t_sec_ERT).
z_tmp <- z - 0.1
i <- 1
T_tmp <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
Looks OK. Then you start looping over the elements of t_sec_ERT.
T_tmp <- matrix(nrow=length(z), ncol=length(t_sec_ERT))
for (i in 1:length(t_sec_ERT)) {
T_tmp[, i] <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
}
Or you can do it with sapply() which is often neater.
f <- function(x) {
MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*x - ((omega/(2*K))^0.5)*z_tmp)
}
T_tmp <- sapply(t_sec_ERT, f)

I would prefer to put the data in a long format, with all combinations of z and t_sec_ERT as two columns, in order to take advantage of vectorization. Although I usually prefer tidyr for switching between long and wide formats, I've tried to keep this as a base solution:
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
v <- expand.grid(t_sec_ERT, z)
names(v) <- c("t_sec_ERT", "z")
v$z_tmp <- v$z-0.1
v$T_tmp <- MANSRT+As*e^(-v$z_tmp*(omega/(2*K))^0.5)*sin(omega*v$t_sec_ERT-((omega/(2*K))^0.5)*v$z_tmp)
T_modelled <- data.frame(matrix(v$T_tmp, nrow = length(z), ncol = length(t_sec_ERT), byrow = TRUE))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))

Related

Use an 'apply' function to perform code with conditional statements in R

I have been working on a project for which I need to find peaks and valleys in a dataset (not just the highest numbers per column, but all of the peaks and valleys).
I did manage to get it to work on 1 column, but I use a for-loop for that and I need to do this for about 50 columns, so I think I should use an 'apply' function. I just don't know how to do so. Can I put 'if' statements and such in an 'apply' function?
Here is what I used for checking one column:
('First' is the name of the dataset and 'Seq1' is the first column)
Lowest = 0
Highest = 0
Summits = vector('numeric')
Valleys = vector('numeric')
for (i in 1:length(First$Seq1))
{
if (!is.na(First$Seq1[i+1]))
{
if (First$Seq1[i] < Lowest) {Lowest = First$Seq1[i]}
if (First$Seq1[i] > Highest) {Highest = First$Seq1[i]}
if (First$Seq1[i] > 0 && First$Seq1[i+1] < 0)
{ Summits <- append(Summits, Highest, after=length(Summits)) }
if (First$Seq1[i] < 0 && First$Seq1[i+1] > 0)
{ Valleys <- append(Valleys, Lowest, after=length(Summits)) }
}
}

Sure you can! I would first define a helper function that defines what is to be done with one specific column and then you call that function within apply:
HelperFun <- function(x) {
# your code from above, replacing 'Seq1' by x
}
apply(First, 2, HelperFun)

An *apply function is not better for this than a for loop, provided you don't grow an object in the for loop. You must never use append in a loop. Pre-allocate your results object and fill it.
This finds all local minima on a grid:
#an example
set.seed(42)
plane <- matrix(rnorm(100, sd = 5), 10)
#plot
library(raster)
plot(raster(plane))
#initialize a logical matrix
res <- matrix(TRUE, ncol = ncol(plane), nrow = nrow(plane))
#check for each subgrid of 2 times 2 cells which of the cells is the minimum
for (i in 1:(nrow(plane) - 1)) {
for (j in 1:(ncol(plane) - 1)) {
inds <- as.matrix(expand.grid(r = i + 0:1, c = j + 0:1))
#cell must be a minimum of all 4 subgrids it is part of
res[inds] <- res[inds] & plane[inds] == min(plane[inds])
}
}
print(res)
plane[res]
#[1] -13.282277 -8.906542 -8.585043 -12.071038 -3.919195 -14.965450 -5.215595 -5.498904 -5.971644 -2.380870 -7.296070
#highlight local minima
plot(rasterToPolygons(raster(res)), border = t(res), add = TRUE)
library(reshape2)
res1 <- melt(res)
res1 <- res1[res1$value,]
text(x = res1$Var2 /10 - 0.05,
y = 1-res1$Var1 /10 + 0.05,
labels = round(plane[res],1))
I've assumed here that diagonal neighbors are counted as neighbors and not only neighbors in the same column or row. But this would be trivial to change.

I know that this is not the solution you want --- you have one-dimensional time series, but here is a (more direct) variation on Roland's solution.
#example data
set.seed(42)
plane <- matrix(rnorm(100, sd = 5), 10)
library(raster)
r <- raster(plane)
f <- focal(r, matrix(1,3,3), min, pad=TRUE, na.rm=TRUE)
x <- r == f
mins <- mask(r, x, maskvalue=FALSE)
pts <- rasterToPoints(mins)
cells <- cellFromXY(x, pts)
r[cells]
plot(r)
text(mins, digits=1)
plot(rasterToPolygons(mins), add=TRUE)

Efficient way to generate permutations of 0 and 1?

What I am trying to do is generate all possible permutations of 1 and 0 given a particular sample size. For instance with a sample of n=8 I would like the m = 2^8 = 256 possible permutations, i.e:
I've written a function in R to do this, but after n=11 it takes a very long time to run. I would prefer a solution in R, but if its in another programming language I can probably figure it out. Thanks!
PermBinary <- function(n){
n.perms <- 2^n
array <- matrix(0,nrow=n,ncol=n.perms)
# array <- big.matrix(n, n.perms, type='integer', init=-5)
for(i in 1:n){
div.length <- ncol(array)/(2^i)
div.num <- ncol(array)/div.length
end <- 0
while(end!=ncol(array)){
end <- end +1
start <- end + div.length
end <- start + div.length -1
array[i,start:end] <- 1
}
}
return(array)
}

expand.grid is probably the best vehicle to get what you want.
For example if you wanted a sample size of 3 we could do something like
expand.grid(0:1, 0:1, 0:1)
For a sample size of 4
expand.grid(0:1, 0:1, 0:1, 0:1)
So what we want to do is find a way to automate that call.
If we had a list of the inputs we want to give to expand.grid we could use do.call to construct the call for us. For example
vals <- 0:1
tmp <- list(vals, vals, vals)
do.call(expand.grid, tmp)
So now the challenge is to automatically make the "tmp" list above in a fashion that we can dictate how many copies of "vals" we want. There are lots of ways to do this but one way is to use replicate. Since we want a list we'll need to tell it to not simplify the result or else we will get a matrix/array as the result.
vals <- 0:1
tmp <- replicate(4, vals, simplify = FALSE)
do.call(expand.grid, tmp)
Alternatively we can use rep on a list input (which I believe is faster because it doesn't have as much overhead as replicate but I haven't tested it)
tmp <- rep(list(vals), 4)
do.call(expand.grid, tmp)
Now wrap that up into a function to get:
binarypermutations <- function(n, vals = 0:1){
tmp <- rep(list(vals), n)
do.call(expand.grid, tmp)
}
Then call with the sample size like so binarypermutations(5).
This gives a data.frame of dimensions 2^n x n as a result - transpose and convert to a different data type if you'd like.

The answer above may be better since it uses base - my first thought was to use data.table's CJ function:
library(data.table)
do.call(CJ, replicate(8, c(0, 1), FALSE))
It will be slightly faster (~15%) than expand.grid, so it will only be more valuable for extreme cases.

Multiply unique pairs of values in a vector and sum the result

I want to multiply and then sum the unique pairs of a vector, excluding pairs made of the same element, such that for c(1:4):
(1*2) + (1*3) + (1*4) + (2*3) + (2*4) + (3*4) == 35
The following code works for the example above:
x <- c(1:4)
bar <- NULL
for( i in 1:length(x)) { bar <- c( bar, i * c((i+1) : length(x)))}
sum(bar[ 1 : (length(bar) - 2)])
However, my actual data is a vector of rational numbers, not integers, so the (i+1) portion of the loop will not work. Is there a way to look at the next element of the set after i, e.g. j, so that I could write i * c((j : length(x))?
I understand that for loops are usually not the most efficient approach, but I could not think of how to accomplish this via apply etc. Examples of that would be welcome, too. Thanks for your help.

An alternative to a loop would be to use combn and multiply the combinations using the FUN argument. Then sum the result:
sum(combn(x = 1:4, m = 2, FUN = function(x) x[1] * x[2]))
# [1] 35
Even better to use prod in FUN, as suggested by #bgoldst:
sum(combn(x = 1:4, m = 2, FUN = prod))

How to apply a distribution function for each row in data frame

I know similar questions have been asked in this site here, here, and here, but none of them tackles my problem.
I've a data frame which I want to apply the rdirichlet function (from gtools) to each line. So, each line shall be consider as aplha.
data = NULL
data <- data.frame(rbind(
oct = c(60, 32, 8),
sep = c(53, 35, 12),
ago = c(54, 40, 6)
))
data <- data/100*1000
library(gtools) # contains the function
sim <- 10000 # simulation
My first attenpt was to use apply, it does work, but the output is not that clear for conducting further analysis; each row computation becomes a vector:
p = apply(data, 1, function(x) rdirichlet(sim, alpha = x + 1))
I also try in a loop without success:
p = NULL
for(i in 1:length(data)) {
p[i] <- rdirichlet(sim, alpha = data[i] + 1)
}
Any tip how can I solve this?

Well firstly you might want to change the data in your anonymous function in the apply to x to match the x in function(x)
apply(data, 1, function(x) rdirichlet(sim, alpha = x + 1))
This works for me, as in it provides an output with three columns and 30000 rows.

Two important things here. First, vectorizing is the best way to go:
ans <- apply(data, 1, function(x) rdirichlet(sim, alpha = x + 1))
By doing this, you'll receive each row computations as vector, essentially k vs sim like.
Then you'll need to subsample things like:
margin <- ans[1:100000,1] - ans[100001:200000,1]

How to sum over range and calculate a series in R?

Here is the formula which I am trying to calculate in R.
So far, this is my approach using a simplified example
t <- seq(1, 2, 0.1)
expk <- function(k){exp(-2*pi*1i*t*k)}
set.seed(123)
dat <- ts(rnorm(100), start = c(1994,3), frequency = 12)
arfit <- ar(dat, order = 4, aic = FALSE) # represent \phi in the formula
tmp1 <- numeric(4)
for (i in seq_along(arfit$ar)){
ek <- expk(i)
arphi <- arfit$ar[i]
tmp1[i] <- ek * arphi
}
tmp2 <- sum(tmp1)
denom = abs(1-tmp2)^2
s2 <- t/denom
Error : Warning message:
In tmp1[i] <- ek * arphi :
number of items to replace is not a multiple of replacement length
I was trying to avoid using for loop and tried using sapply as in solutions to this question.
denom2 <- abs(1- sapply(seq_along(arfit$ar), function(x)sum(arfit$ar[x]*expf(x))))^2
but doesnt seem to be correct. The problem is to do the sum of the series(over index k) when it is taking values from another vector as well, in this case, t which is in the numerator.
Any solutions ?
Any suggestion for a test dataset, maybe using 0 and 1 to check if the calculation is done correctly in this loop here ?

Typing up the answer determined in chat. Here's a solution involving vapply.
First correct expk to:
expk <- function(k){sum(exp(-2*pi*1i*t*k))}
Then you can create this function and vapply it:
myFun <- function(i) return(expk(i) * arfit$ar[i])
tmp2 <- sum(vapply(seq_along(arfit$ar), myFun, complex(1)))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Avoiding a loop when populating data frames in R - r

Related

Use an 'apply' function to perform code with conditional statements in R

Efficient way to generate permutations of 0 and 1?

Multiply unique pairs of values in a vector and sum the result

How to apply a distribution function for each row in data frame

How to sum over range and calculate a series in R?

Categories

Resources