Part 2: How to make nested for-loop do every permutation - r

This is an update to the first question I asked. I essentially am still missing the pretty obvious learning lesson and can't expand the code that worked originally.
Trying to store a nested loop that runs all permutations of calculations.
I'm missing understanding on how to set up the index still. I am now trying to set this up with three sets of loops (and really, I want to set this up so I understand how to do it with any # of them). The code below doesn't exactly work, the third loop doesn’t index over every combination (but I get 1000 calculations anyway) I commented the part that is clearly wrong...
With only two loops, this was sufficient, per the answer of my first post:
index <- 10*(i-1) + j
So not sure why the way I changed it for 3 loops doesn't work, but it's obviously wrong.
iter = 10 #length of parameter ranges to test
perm = 3 #how many parameters are being varied
n_c <- 1
m_c <- 1
n_n <- 1
m_n <- 1
n_v <- 1
my_data_c <- vector("numeric", iter^perm)
my_data_n <- vector("numeric", iter^perm)
my_data_v <- vector("numeric", iter^perm)
rho_c_store <- vector("numeric", iter)
rho_n_store <- vector("numeric", iter)
rho_v_store <- vector("numeric", iter)
for (i in 1:iter) {
# you can move this assignment to the outer loop
rho_c <- (i / 10)
x <- (rho_c * n_c)/m_c
for (j in 1:iter) {
rho_n <- (j / 10)
y <- (rho_n * n_n)/m_n
for (k in 1:iter){
rho_v <- (k / 10)
z <- rho_v/n_v
index <- iter*(i-2)+j+ k #Clearly where the error is
rho_c_store[index] <- rho_c
rho_n_store[index] <- rho_n
rho_v_store[index] <- rho_v
my_data_c[index] <- x
my_data_n[index] <- y
my_data_v[index] <- z
}
}
}
my_data <- cbind(rho_c_store, rho_n_store, rho_v_store, my_data_c, my_data_n,my_data_v)
print(my_data)

Think of it as if you were counting: the ones move the fastest, next come the tens, then the hundredes. 001, 002, ..., 010, ..., 099, 100, 101. You can think of your loop variables ijk like the digits of a number - k moves the fastest, j slower and i slower still. To get the right index you have to multiply the i with 100, the j with 10 and the k with one, givin you the index 100 * (i-1) + 10 * (j-1) + k. The -1 for i and j is necessary because the loop variable starts from 1, but we want to start by adding 0 * 100 and 0 * 10.
So all you need to do is change your index calculation to index <- iter^2*(i-1)+ iter*(j-1) + k.
# I stripped the example to the essentials for easier understanding
iter = 10 #length of parameter ranges to test
perm = 3 #how many parameters are being varied
my_data_c <- vector("numeric", iter^perm)
my_data_n <- vector("numeric", iter^perm)
my_data_v <- vector("numeric", iter^perm)
for (i in 1:iter) {
x <- i / 10
for (j in 1:iter) {
y <- j / 10
for (k in 1:iter){
z <- k / 10
index <- iter^2*(i-1)+ iter*(j-1) + k
my_data_c[index] <- x
my_data_n[index] <- y
my_data_v[index] <- z
}
}
}
my_data <- cbind(my_data_c, my_data_n, my_data_v)
print(my_data)
Anyway, while it's great to develop an deep understanding of these things you're certainly on the right track when you move away from looping with explicit indices and use R's toolkit for such tasks.
Hope this helps

In case anyone else wants to try this...expand.grid is definitely the way to go. Still not sure how to deal with the nested loop from a coding perspective, but oh well..
n_c <- 1
m_c <- 1
n_n <- 1
m_n <- 1
n_v <- 1
c = c(1,2)
n = c(10,20)
v = c(100,200)
parm_length = 3
iter = length(c)
test <- data.frame(matrix(ncol=parm_length, nrow=iter^parm_length))
test[] <- expand.grid(c,n,v)
x <- (test[,1] * n_c)/m_c
y <- (test[,2]* n_n)/m_n
z <- (test[,3]/n_v)
group <- cbind(test,x,y,z)
print(group)

Related

Is it possible to use vector math in R for a summation involving intervals?

Title's a little rough, open to suggestions to improve.
I'm trying to calculate time-average covariances for a 500 length vector.
This is the equation we're using
The result I'm hoping for is a vector with an entry for k from 0 to 500 (0 would just be the variance of the whole set).
I've started with something like this, but I know I'll need to reference the gap (i) in the first mean comparison as well:
x <- rnorm(500)
xMean <-mean(x)
i <- seq(1, 500)
dfGam <- data.frame(i)
dfGam$gamma <- (1/(500-dfGam$i))*(sum((x-xMean)*(x[-dfGam$i]-xMean)))
Is it possible to do this using vector math or will I need to use some sort of for loop?
Here's the for loop that I've come up with for the solution:
gamma_func <- function(input_vec) {
output_vec <- c()
input_mean <- mean(input_vec)
iter <- seq(1, length(input_vec)-1)
for(val in iter){
iter2 <- seq((val+1), length(input_vec))
gamma_sum <- 0
for(val2 in iter2){
gamma_sum <- gamma_sum + (input_vec[val2]-input_mean)*(input_vec[val2-val]-input_mean)
}
output_vec[val] <- (1/length(iter2))*gamma_sum
}
return(output_vec)
}
Thanks
Using data.table, mostly for the shift function to make x_{t - k}, you can do this:
library(data.table)
gammabar <- function(k, x){
xbar <- mean(x)
n <- length(x)
df <- data.table(xt = x, xtk = shift(x, k))[!is.na(xtk)]
df[, sum((xt - xbar)*(xtk - xbar))/n]
}
gammabar(k = 10, x)
# [1] -0.1553118
The filter [!is.na(xtk)] starts the sum at t = k + 1, because xtk will be NA for the first k indices due to being shifted by k.
Reproducible x
x <- c(0.376972124936433, 0.301548373935665, -1.0980231706536, -1.13040590360378,
-2.79653431987176, 0.720573498411587, 0.93912102300901, -0.229377746707471,
1.75913134696347, 0.117366786802848, -0.853122822287008, 0.909259181618213,
1.19637295955276, -0.371583903741348, -0.123260233287436, 1.80004311672545,
1.70399587729432, -3.03876460529759, -2.28897494991878, 0.0583034949929225,
2.17436525195634, 1.09818265352131, 0.318220322390854, -0.0731475581637693,
0.834268741278827, 0.198750636733429, 1.29784138432631, 0.936718306241348,
-0.147433193833294, 0.110431994640128, -0.812504663900505, -0.743702167768748,
1.09534507180741, 2.43537370755095, 0.38811846676708, 0.290627670295127,
-0.285598287083935, 0.0760147178373681, -0.560298603759627, 0.447188372143361,
0.908501134499943, -0.505059597708343, -0.301004012157305, -0.726035976548133,
-1.18007702699501, 0.253074712637114, -0.370711296884049, 0.0221795637601637,
0.660044122429767, 0.48879363533552)

What is a better way to write this nested for loop in R?

I am writing a for loop to calculate a numerator which is part of a larger formula. I used a for loop but it is taking a lot of time to compute. What would be a better way to do this.
city is a dataframe with the following columns: pop, not.white, pct.not.white
n <- nrow(city)
numerator = 0
for(i in 1:n) {
ti <- city$pop[i]
pi<- city$pct.not.white[i]
for(j in 1:n) {
tj <- city$pop[j]
pj <- city$pct.not.white[j]
numerator = numerator + (ti * tj) * abs(pi -pj)
}
}
Use the following toy data for result validation.
set.seed(0)
city <- data.frame(pop = runif(101), pct.not.white = runif(101))
The most obvious "vectorization":
# n <- nrow(city)
titj <- tcrossprod(city$pop)
pipj <- outer(city$pct.not.white, city$pct.not.white, "-")
numerator <- sum(titj * abs(pipj))
Will probably have memory problem if n > 5000.
A clever workaround (exploiting symmetry; more memory efficient "vectorization"):
## see https://stackoverflow.com/a/52086291/4891738 for function: tri_ind
n <- nrow(city)
ij <- tri_ind(n, lower = TRUE, diag = FALSE)
titj <- city$pop[ij$i] * city$pop[ij$j]
pipj <- abs(city$pct.not.white[ij$i] - city$pct.not.white[ij$j])
numerator <- 2 * crossprod(titj, pipj)[1]
The ultimate solution is to write C / C++ loop, which I will not showcase.

Pricing of Asian Option using R

The pricing of the Asian option is approximated, using Monte Carlo simulation, by:
delta <- 1/12
T <- 2
S0 <- 100
sigma <- 0.20
K <- 100
r <- 0.01
n <- 10^4
m <- T/delta
S <- S0
for(i in 1:n) {
for(j in 1:m) {
W <- rnorm(1)
Si <- S[length(S)]*exp((r-0.5*sigma^2)*delta + sigma*sqrt(delta)*W)
S <- c(S, Si)
}
Si.bar <- mean(S[-1])
Ci <- exp(-r*T)*max(Si.bar - K, 0)
}
mean(Ci)
The for(j in 1:m) for loop runs perfectly, I think... But when I run it n times, using for(i in 1:n) S gets smaller and smaller by n. It decreases to almost zero when n grows. This leads to a mean (Si.bar <- mean(S[-1]) well below the strike price, K= 100.
I can't figure out what is wrong with the two last lines of codes. I'm getting a value on the Asian call option of 0, due to the payoff function. The correct solution to this option is a value of approximately 7 (mean(Ci))
There's a couple of issues with your code. Firstly, it's inefficient in R to build a vector by repeated concatenation. Instead, you should allocate the vector up front and then assign to its members.
Secondly, as I understand it, the aim is to repeat the inner loop n times and store the output into members of a vector C before taking the mean. That's not what you're doing at the moment - each iteration of the outer loop makes S longer and overwrites Ci such that the last statement, mean(Ci) is meaningless.
Here's an amended version of the code. I've used plyr partly to make the code neater, and partly for its progress bar functionality.
library(plyr)
delta <- 1/12
T <- 2
S0 <- 100
sigma <- 0.20
K <- 100
r <- 0.01
n <- 10^4
m <- T/delta
S <- numeric(m + 1)
S[1] <- S0
asian_price <- function() {
for(j in 1:m) {
W <- rnorm(1)
S[j + 1] <- S[j] * exp((r - 0.5 * sigma^2) * delta + sigma * sqrt(delta) * W)
}
Si.bar <- mean(S[-1])
exp(-r * T) * max(Si.bar - K, 0)
}
C <- raply(n, asian_price(), .progress = "text")
mean(C)
# [1] 7.03392

Loop inside a loop in R

I am trying to create an R code that puts another loop inside of the one I've already created. Here is my code:
t <- rep(1,1000)
omega <- seq(from=1,to=12,by=1)
for(i in 1:1000){
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
remove <- 0
f <- length(t [! t %in% remove]) + 1
}
When I run this code, I get a number a trials it takes f to reach the zero vector, but I want to do 10000 iterations of this experiment.
replicate is probably how you want to run the outer loop. There's also no need for the f assignment to be inside the loop. Here I've moved it outside and converted it to simply count of the elements of t that are greater than 0, plus 1.
result <- replicate(10000, {
t <- rep(1, 1000)
omega <- 1:12
for(i in seq_along(t)) {
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
}
sum(t > 0) + 1
})
I suspect your code could be simplified in other ways as well, and also that you could just write down the distribution that you're looking for without simulation. I believe your variable of interest is just how long until you get at least one of each of the numbers 1:12, yes?
Are you just looking to run your existing loop 10,000 times, like below?
t <- rep(1,1000)
omega <- seq(from=1,to=12,by=1)
f <- rep(NA, 10000)
for(j in 1:10000) {
for(i in 1:1000){
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
remove <- 0
f[j] <- length(t [! t %in% remove]) + 1
}
}

Repeating a for loop in R

Suppose I have a 10 x 10 matrix. I want to randomly choose 2 numbers from each column and take the square of the difference of these numbers. I wrote the R code for that and I get 10 values, but I wish to repeat this, say, 100 times, in which case I need to get 100*10=1000 numbers. How could I do that?
x <- rnorm(100)
m <- 10
n <- 10
X <- matrix(x,m,n)
for (i in 1:m ) {
y <- sample(X[,i],2,rep=F)
q2[i] <- (y[1]-y[2])^2
}
Or as #Davide Passaretti and #nrussell mentioned in the comments, you can use replicate
f1 <- function(x, m){
q2 <- vector(mode='numeric', length= m)
for(i in 1:m){
y <- sample(x[,i], 2, rep=FALSE)
q2[i] <- (y[1]-y[2])^2
}
q2
}
n <- 100
res <- replicate(100, f1(X, m))
prod(dim(res))
#[1] 1000

Resources