mapply for better performance - r

I want to apply a function to a matrix input a, this function would change the first element to c[a[1]] and the next elements to b[a[i],a[i+1]] starting from i = 1 up to i = ncol(a) - 1.
example input:
a <- matrix(c(1,4,3,1),nrow=1)
b <- matrix(1:25,ncol=5,nrow=5)
c <- matrix(4:8,ncol=5,nrow=1)
expected output:
>a
4 16 14 3
#c[a[1]] gave us the first element: 4
#b[a[1],a[2]] gave us the second element: 16
#b[a[2],a[3]] gave us the third element: 14
#b[a[3],a[4]] gave us the fourth element: 3
I've been trying to use mapply() without any success so far. The idea is to avoid loops since those things can lead to major performance decrease in R

Step 1: using single index for addressing matrix
In R matrix elements are stored in column-major order into a vector, so A[i, j] is the same as A[(j-1)*nrow(A) + i]. Consider an example of random 3-by-3 matrix:
set.seed(1); A <- round(matrix(runif(9), 3, 3), 2)
> A
[,1] [,2] [,3]
[1,] 0.27 0.91 0.94
[2,] 0.37 0.20 0.66
[3,] 0.57 0.90 0.63
Now, this matrix has 3 rows (nrow(A) = 3). Compare:
A[2,3] # 0.66
A[(3-1) * 3 + 2] # 0.66
Step 2: vectorizing
You can address multiple elements of a matrix at a time. However, you can only do this by using single indexing mode (Not too precise here, see #alexis_laz's remark later). For example, if you want to extract A[1,2] and A[3,1], but if you do:
A[c(1,3), c(2,1)]
# [,1] [,2]
# [1,] 0.91 0.27
# [2,] 0.90 0.57
You actually get a block. Now, if you use single indexing, you get what you need:
A[3 * (c(2,1) - 1) + c(1,3)]
# [1] 0.91 0.57
Step 3: getting single index for your problem
Suppose n <- length(a) and you want to address those elements of b:
a[1] a[2]
a[2] a[3]
. .
. .
a[n-1] a[n]
you can use single index nrow(b) * (a[2:n] - 1) + a[1:(n-1)].
Step 4: complete solution
Since you only have single row for a and c, you should store them as vectors rather than matrices.
a <- c(1,4,3,1)
c <- 4:8
If you were given a matrix and have no choice (as they are currently are in your question), you can convert them into vectors by:
a <- as.numeric(a)
c <- as.numeric(c)
Now, as discussed, we have index for address b matrix:
n <- length(a)
b_ind <- nrow(b) * (a[2:n] - 1) + a[1:(n-1)]
You also address a[1] element of c as the first element of your final result, so we need concatenate: c[a[1]] and b[b_ind] by:
a <- c(c[a[1]], b[b_ind])
# > a
# [1] 4 16 14 3
This approach is fully vectorized, even better than *apply family.
alexis_laz's remark
alexis_laz reminds me that we can use "matrix-index" as well, i.e., we can also address matrix b via:
b[cbind(a[1:(n-1)],a[2:n])] ## or b[cbind(a[-n], a[-1])]
However, I think using single index is slightly faster, because we need to access the index matrix by row in order to address b, so we pay higher memory latency than using vector index.

Related

Selecting consecutive values from a bootstrap sample in r with repeated values

I'm not exactly sure how to go about this in R. I've got a data set with 40 values, some of which repeat and I want to perform a small bootstrap on this dataset to find the mean of two or more consecutive values. For example, I randomly select a value from the dataset provided below, say the very first value is selected which is 0.2, so x1=0.2. How can I make sure that in the same for loop R is able to select the next value, x2, to be 0.2 as that is the second value in the dataset? Thus it would appear as x1=0.2 and x2=0.2.
I can't really think of a way for this to be done as it would need to be repeated for each iteration and since the sample() function selects any random value that makes it harder to pinpoint exactly which value it selected given there are repeated values.
I've provided a sample code that calculates the mean for 1 observation and I would like to get it to work for 2 consecutive observations. So then I can calculate the means individually and display them.
If anyone has any way to handle this I would appreciate it.
Thanks ahead of time.
x=c(0.20,0.20,0.21,0.21,0.21,0.20,0.19,0.18,0.16,0.10,
0.02,-0.02,0.01,0.03,0.07,0.14,0.22,0.13,0.12,
0.16,0.17,0.18,0.18,0.17,0.15,0.15,0.13,0.12,
0.10,0.08,0.06,0.04,0.03,0.02,0.03,0.05,0.34,
0.13,0.11,0.12)
B<- 500
result1<- numeric(B)
# result2<- numerib(B)
for (b in 1:B){
x1<-sample(x=x,size =1, replace=TRUE)
# x2<-
result1[b]<-x1
# result2[b]<-x2
}
mean1<- mean(result1)
# mean2<- mean(result2)
A simple approach could be:
result <- matrix(nrow = B, ncol = 2)
for (b in 1:B){
idx1 <- sample(seq_along(x), size = 1)
idx2 <- idx1 %% length(x) + 1
result[b, 1] <- x[idx1]
result[b, 2] <- x[idx2]
}
storing the results in a matrix:
> result
[,1] [,2]
[1,] 0.21 0.21
[2,] 0.12 0.20
[3,] 0.21 0.21
[4,] 0.10 0.02
[5,] 0.10 0.02
[6,] 0.21 0.20
[7,] 0.02 -0.02
[8,] -0.02 0.01
[9,] 0.21 0.20
[10,] 0.17 0.15
Sample the indices of x, then use this to subset x for result1. Use the sampled index + 1 to subset x for result2. However, you also need a wrap around so that if you sample the last member of x, you sample the first as well (as the "next" value)
B <- 500
result1<- numeric(B)
result2 <- numeric(B)
for(i in 1:B) {
j <- sample(seq_along(x), 1)
if(j == 40) k <- 1
else k <- j + 1
result1[i] <- x[j]
result2[i] <- x[k]
}
mean(result1)
#> [1] 0.12618
mean(result2)
#> [1] 0.13034
Note also that since R is vectorized, you don't need a loop here at all. You could just do:
result1 <- sample(seq_along(x), 500, replace = TRUE)
result2 <- result1 + 1
result2[result2 == 41] <- 1
mean(x[result1])
#> [1] 0.12568
mean(x[result2])
#> [1] 0.12596
Created on 2022-03-28 by the reprex package (v2.0.1)
Could you work out all the possible consecutive means and then sample from that? How about:
library(RcppRoll)
x=c(0.20,0.20,0.21,0.21,0.21,0.20,0.19,0.18,0.16,0.10,
0.02,-0.02,0.01,0.03,0.07,0.14,0.22,0.13,0.12,
0.16,0.17,0.18,0.18,0.17,0.15,0.15,0.13,0.12,
0.10,0.08,0.06,0.04,0.03,0.02,0.03,0.05,0.34,
0.13,0.11,0.12)
rollmean <- roll_mean(x,2)
r <- sample(rollmean, 500, replace= T)
hist(r)
Which gives you:

Angle between vector and list of vectors in R

When comparing two vectors it is simple to calculate the angle between them, but in R it is noticeably harder to calculate the angle between a vector and a matrix of vectors efficiently.
Say you have a 2D vector A=(2, 0) and then a matrix B={(1,3), (-2,4), (-3,-3), (1,-4)}. I am interested in working out the smallest angle between A and the vectors in B.
If I try to use
min(acos( sum(a%*%b) / ( sqrt(sum(a %*% a)) * sqrt(sum(b %*% b)) ) ))
it fails as they are non-conformable arguments.
Is there any code similar to that of above which can handle a vector and matrix?
Note: At the risk of being marked as a duplicate the solutions found in several sources do not apply in this case
Edit: The reason for this is I have a large matrix X, and A is just one row of this. I am reducing the number of elements based solely on the angle of each vector. The first element of B is the first in X, and then if the angle between any element in B and the next element X[,2] (here A) is greater than a certain tolerance, this is added to the list B. I am just using B<-rbind(B,X[,2]) to do this, so this results in B being a matrix.
You don't describe the format of A and B in detail, so I assume they are matrices by rows.
(A <- c(2, 0))
# [1] 2 0
(B <- rbind(c(1,3), c(-2,4), c(-3,-3), c(1,-4)))
# [,1] [,2]
# [1,] 1 3
# [2,] -2 4
# [3,] -3 -3
# [4,] 1 -4
Solution 1 with apply():
apply(B, 1, FUN = function(x){
acos(sum(x*A) / (sqrt(sum(x*x)) * sqrt(sum(A*A))))
})
# [1] 1.249046 2.034444 2.356194 1.325818
Solution 2 with sweep(): (replace sum() above with rowSums())
sweep(B, 2, A, FUN = function(x, y){
acos(rowSums(x*y) / (sqrt(rowSums(x*x)) * sqrt(rowSums(y*y))))
})
# [1] 1.249046 2.034444 2.356194 1.325818
Solution 3 with split() and mapply:
mapply(function(x, y){
acos(sum(x*y) / (sqrt(sum(x*x)) * sqrt(sum(y*y))))
}, split(B, row(B)), list(A))
# 1 2 3 4
# 1.249046 2.034444 2.356194 1.325818
The vector of dot products between the rows of B and the vector A is B %*% A. The vector lengths of the rows of B are sqrt(rowSums(B^2)).
To find the smallest angle, you want the largest cosine, but you don't actually need to compute the angle, so the length of A doesn't matter.
Thus the row with the smallest angle will be given by row <- which.max((B %*% A)/sqrt(rowSums(B^2))). With Darren's data, that's row 1.
If you really do need the smallest angle, then you can apply the formula for two vectors to B[row,] and A. If you need all of the angles, then the formula would be
acos((B %*% A)/sqrt(rowSums(B^2))/sqrt(sum(A^2)))

How to find the maximum value within a loop in R

I have an expression
qbinom(0.05, n, .47) - 1
and I want to create a loop which iterates this expression over n for n = (20,200). For each iteration of this loop, this function will produce a number. I want to take the maximum of the 180 numbers it will produce. So, something like.
for (n in 20:200) {
max(qbinom(0.05, n, .47)-1)
But I'm not sure how exactly to do this.
Thanks!
First, I will show you how to do this with a loop.
n <- 20:200
MAX = -Inf ## initialize maximum
for (i in 1:length(n)) {
x <- qbinom(0.05, n[i], 0.47) - 1
if (x > MAX) MAX <- x
}
MAX
# [1] 81
Note, I am not keeping a record of all 181 values generated. Each value is treated as a temporary value and will be overwritten in the next iteration. In the end, we only have a single value MAX.
If you want to at the same time retain all the records, we need first initialize a vector to hold them.
n <- 20:200
MAX = -Inf ## initialize maximum
x <- numeric(length(n)) ## vector to hold record
for (i in 1:length(n)) {
x[i] <- qbinom(0.05, n[i], 0.47) - 1
if (x[i] > MAX) MAX <- x[i]
}
## check the first few values of `x`
head(x)
# [1] 5 5 6 6 6 7
MAX
# [1] 81
Now I am showing the vectorization solution.
max(qbinom(0.05, 20:200, 0.47) - 1)
# [1] 81
R functions related to probability distributions are vectorized in the same fashion. For those related to binomial distributions, you can read ?rbinom for details.
Note, the vectorization is achieved with recycling rule. For example, by specifying:
qbinom(0.05, 1:4, 0.47)
R will first do recycling:
p: 0.05 0.05 0.05 0.05
mean: 1 2 3 4
sd: 0.47 0.47 0.47 0.47
then evaluate
qbinom(p[i], mean[i], sd[i])
via a C-level loop.
Follow-up
How would I be able to know which of the 20:200 corresponds to the maximum using the vectorization solution?
We can use
x <- qbinom(0.05, 20:200, 0.47) - 1
i <- which.max(x)
# [1] 179
Note, i is the position in vector 20:200. To get the n you want, you need:
(20:200)[i]
# 198
The maximum is
x[i]
# [1] 81

Perform an operation on a vector using the previous value after an initial value

In Excel, it's easy to perform a calculation on a previous cell by referencing that earlier cell. For example, starting from an initial value of 100 (step = 0), each next step would be 0.9 * previous + 9 simply by dragging the formula bar down from the first cell (step = 1). The next 10 steps would look like:
step value
[1,] 0 100.00000
[2,] 1 99.00000
[3,] 2 98.10000
[4,] 3 97.29000
[5,] 4 96.56100
[6,] 5 95.90490
[7,] 6 95.31441
[8,] 7 94.78297
[9,] 8 94.30467
[10,] 9 93.87420
[11,] 10 93.48678
I've looked around the web and StackOverflow, and the best I could come up with is a for loop (below). Are there more efficient ways to do this? Is it possible to avoid a for loop? It seems like most functions in R (such as cumsum, diff, apply, etc) work on existing vectors instead of calculating new values on the fly from previous ones.
#for loop. This works
value <- 100 #Initial value
for(i in 2:11) {
current <- 0.9 * value[i-1] + 9
value <- append(value, current)
}
cbind(step = 0:10, value) #Prints the example output shown above
It seems like you're looking for a way to do recursive calculations in R. Base R has two ways of doing this which differ by the form of the function used to do the recursion. Both methods could be used for your example.
Reduce can be used with recursion equations of the form v[i+1] = function(v[i], x[i]) where v is the calculated vector and x an input vector; i.e. where the i+1 output depends only the i-th values of the calculated and input vectors and the calculation performed by function(v, x) may be nonlinear. For you case, this would be
value <- 100
nout <- 10
# v[i+1] = function(v[i], x[i])
v <- Reduce(function(v, x) .9*v + 9, x=numeric(nout), init=value, accumulate=TRUE)
cbind(step = 0:nout, v)
filter is used with recursion equations of the form y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p] where y is the calculated vector and x an input vector; i.e. where the output can depend linearly upon lagged values of the calculated vector as well as the i-th value of the input vector. For your case, this would be:
value <- 100
nout <- 10
# y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p]
y <- c(value, stats::filter(x=rep(9, nout), filter=.9, method="recursive", sides=1, init=value))
cbind(step = 0:nout, y)
For both functions, the length of the output is given by the length of the input vector x.
Both of these approaches give your result.
Use our knowledge about the geometric series.
i <- 0:10
0.9 ^ i * 100 + 9 * (0.9 ^ i - 1) / (0.9 - 1)
#[1] 100.00000 99.00000 98.10000 97.29000 96.56100 95.90490 95.31441 94.78297 94.30467 93.87420 93.48678
You could also use purrr::accumulate:
data.frame(value = purrr::accumulate(0:10, ~ .x * .9 + 9, .init = 100))
value
1 100.00000
2 99.00000
3 98.10000
4 97.29000
5 96.56100
6 95.90490
7 95.31441
8 94.78297
9 94.30467
10 93.87420
11 93.48678
12 93.13811
.init is the initial value and there is also the argument .dir if you want to control the direction ("forward" is the default)

R: Vectorize Finite Difference Equations

I'm trying to move some Fortran code to R for finite differences related to chemical kinetics.
Sample Fortran loop:
DOUBLE PRECISION, DIMENSION (2000,2) :: data=0.0
DOUBLE PRECISION :: k1=5.0, k2=20.0, dt=0.0005
DO i=2, 2000
data(i,1) = data(i-1,1) + data(i-1,1)*(-k1)*dt
data(i,2) = data(i-1,2) + ( data(i-1,1)*k1*dt - data(i-1,2)*k2*dt )
...
END DO
The analogous R code:
k1=5
k2=20
dt=0.0005
data=data.frame(cbind(c(500,rep(0,1999)),rep(0,2000)))
a.fun=function(y){
y2=y-k1*y*dt
return(y2)
}
apply(data,2,a.fun)
This overwrites my first value in the dataframe and leaves zeros elsewhere. I'd like to run this vectorized and not using a for loop since they are so slow in R. Also, my function only calculates the first column so far. I can't get the second column working until I get the syntax right on the first.
Its not necessarily true that R is bad at loops. It very much depends on what you are doing. Using k1, k2, dt and data from the question (i.e. the four lines beginning with k1=5) and formulating the problem in terms of an iterated matrix, the loop in the last line below returns nearly instantaneously on my PC:
z <- as.matrix(data)
m <- matrix(c(1-k1*dt, k1*dt, 0, 1-k2*dt), 2)
for(i in 2:nrow(z)) z[i, ] <- m %*% z[i-1, ]
(You could also try storing the vectors in columns of z rather than rows since R stores matrices by column.)
Here is the first bit of the result:
> head(z)
X1 X2
[1,] 500.0000 0.000000
[2,] 498.7500 1.250000
[3,] 497.5031 2.484375
[4,] 496.2594 3.703289
[5,] 495.0187 4.906905
[6,] 493.7812 6.095382
May be this can help.
I think you need to have the initial condition for data[1,2]. I assumed both data[1,1] as 500 and data[1,2 as 0 at the initial condition.
The code goes like this:
> ## Define two vectors x and y
> x <- seq(from=0,length=2000,by=0)
> y <- seq(from=0,length=2000,by=0)
>
> ## Constants
> k1 = 5.0
> dt = 0.0005
> k2 = 20.0
>
> ## Initialize x[1]=500 and y[1]=0
> x[1]=500
> y[1] = 0
>
> for (i in 2:2000){
+ x[i]=x[i-1]+x[i-1]*-k1*dt
+ y[i] = y[i-1]+x[i-1]*k1*dt-y[i-1]*k2*dt
+ }
>
> finaldata <- data.frame(x,y)
> head(finaldata)
x y
1 500.0000 0.000000
2 498.7500 1.250000
3 497.5031 2.484375
4 496.2594 3.703289
5 495.0187 4.906905
6 493.7812 6.095382
I hope this helps.

Resources