Vectorizing code and stuck but good - r

Here are some sample starting values for variables in the code below.
sd <- 2
sdtheory <- 1.5
meanoftheory <- 0.6
obtained <- 0.8
tails <- 2
I'm trying to vectorize the following code. It is a component of a Bayes factor calculator that was originally written by Dienes and adapted to R by Danny Kaye & Thom Baguley. This part is for calculating the likelihood for the theory. I've got the thing massively sped up by vectorizing but I can't match output of the bit below.
area <- 0
theta <- meanoftheory - 5 * sdtheory
incr <- sdtheory / 200
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if(identical(tails, 1)){
if (theta <= 0){
dist_theta <- 0
} else {
dist_theta <- dist_theta * 2
}
}
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
area
And below is the vectorized version.
incr <- sdtheory / 200
newLower <- meanoftheory - 5 * sdtheory + incr
theta <- seq(newLower, by = incr, length.out = 2001)
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if (tails == 1){
dist_theta <- dist_theta[theta > 0] * 2
theta <- theta[theta > 0]
}
height <- dist_theta * dnorm(obtained, theta, sd)
area <- sum(height * incr)
area
This code exactly copies the results of the original if tails <- 2. Everything I've got here so far should just copy and paste and give the exact same results. However, once tails <- 1 the second function no longer matches exactly. But as near as I can tell I'm doing the equivalent in the new if statement to what is happening in the original. Any help would be appreciated.
(I did try to create a more minimal example, stripping it down to just he loop and if statements and a tiny amount of slices and I just couldn't get the code to fail.)

You're dropping observations where theta==0. That's a problem because the output of dnorm is not zero when theta==0. You need those observations in your output.
Rather than drop observations, a better solution would be to set those elements to zero.
incr <- sdtheory / 200
newLower <- meanoftheory - 5 * sdtheory + incr
theta <- seq(newLower, by = incr, length.out = 2001)
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if (tails == 1){
dist_theta <- ifelse(theta < 0, 0, dist_theta) * 2
theta[theta < 0] <- 0
}
height <- dist_theta * dnorm(obtained, theta, sd)
area <- sum(height * incr)
area

The original calculation has an error due to floating point arithmetic; adding incr each time causes theta to actually equal 7.204654e-14 when it should equal zero. So it's not actually doing the right thing on that pass through the loop; it's not doing the <= code when it should be. Your code is (at least, it did with these starting values on my machine).
Your code isn't necessarily guaranteed to do the right thing every time either; what seq does is better than adding an increment over and over again, but it's still floating point arithmetic. You really should probably be checking to within machine tolerance of zero, perhaps using all.equal or something similar.

Related

Trying to write a gradient descent algorithm in R

Implement the gradient descent algorithm in this question. Let
{X1,…,Xn} be a dataset and g(x)=n−1∑ni=1(x−Xi)2. It is known that the
mean of the dataset is the solution to the following minimization
problem minx∈ℝg(x).
To minimize g(x), you are going to use a while loop to implement the
gradient descent algorithm, as follows.
Step 0. Initialize x1=0 Step 1. In the kth step, where k=1,2,…, set
xk+1=xk−0.99k×g′(xk).
Step 2. Repeat Step 1 until |g′(xk)| is smaller than a small tolerance
level tol (e.g., set it to 1e-5) or if k exceeds the maximum number of
iterations Kmax (e.g., set it to 1000).
You are going to implement the gradient descent algorithm to find the
mean. Use the dataset cars$speed for {X1,…,Xn}.You don’t have to write
the algorithm into a function in this question; you are going to do
this in the next.
Could someone help me with this?
Here is what I have so far
data(cars)
x1 <- 0
k <- 1
toleranceLevel <-0.00005
X <- cars$speed
kmax <- 10000
while(x1 > toleranceLevel){
gxprime <- 2 * mean(x1 - X)
gxprime
x1 <-(((x1)-(.99^k))*gxprime)
if(x1 < toleranceLevel){
k <- k + 1
} else {
}
if(k == kmax){
break
}
print(k)
}
data(cars)
x_old <- 0
k <- 1
toleranceLevel <-0.00005
X <- cars$speed
kmax <- 10000
err <- 1
while(err > toleranceLevel & k < kmax){
x_new <- x_old -.99^k * 2 * mean(x_old - X)
err <- abs(x_new - x_old)
x_old <- x_new
k <- k + 1
}
x_new

Brownian Motion / loop in R

I want to implete the function of the Wiener representation in R (see https://en.wikipedia.org/wiki/Wiener_process#Wiener_representation). (I want to implement the first formulae) When plotting this
function it should look more similar to the standard brownian motion the higher the dimension of the random vector is, and the lower it should look smoother.
I have tried to implement it, but I think there is a mistake somewhere in the loop, because the graphs do not should look much more like a brownian motion when n is high, I even went as high as 10000 there isn't enough fluctation inside each graph
brownmotion <- function(n, time=1000){
W <- rep(0, time)
Wp1 <- rep(0, time)
Wp2 <- 0
X <- seq(0, 1, length.out = time)
xsi <- rnorm(n)
for ( i in 1:length(X)){
for (j in 1:n){
Wp1[i] <- X[i]*xsi[1]
Wp2 <- Wp2 + xsi[j]*sin(j*X[i]*pi)/(j*pi)
W[i] <- Wp1[i] + sqrt(2)*Wp2
}
}
return (W)
}
Since this is R, this is better done without loops:
brownmotion <- function(n, time=1000){
X <- seq(0, 1, length.out = time)
xsi <- rnorm(n + 1)
W <- xsi[1] * X + sqrt(2) * colSums(xsi[-1] * sin(pi * 1:n %*% t(X)) / (pi * 1:n))
return (W)
}
When coding this, I noticed a small error in your original code in that you use xsi[1] twice. I avoided this by making xsi length n + 1, so xsi[1] could be the initial value and there are still n values left.

Cooley-Tukey FFT in R radix-2 DIT case

So I've been trying to (manually) implement the Cooley-Turkey FFT algorithm in R (for Inputs with size N=n^2). I tried:
myfft <- function(s){
N <- length(s)
if (N != 1){
s[1:(N/2)] <- myfft(s[(1:(N/2))*2-1])
s[(N/2+1):N] <- myfft(s[(1:(N/2))*2])
for (k in 1:(N/2)){
t <- s[k]
s[k] <- t + exp(-1i*2*pi*(k-1)/N) * s[k+N/2]
s[k+N/2] <- t - exp(-1i*2*pi*(k-1)/N) * s[k+N/2]
}
}
s
}
This compiles, but for n>1, N=2^n it does not compute the right values. I implemented a DFT-function and used the fft() function to compare, both compute, when normalized, give the same values, but seem to disagree with my algorithm above.
If anyone feels interested and sees where I went wrong, help would be greatly appreciated, I'm going mad searching for the mistake and am starting to question, if I even ever understood this FFT algorithm.
UPDATE: I fixed it, I'm not 100% sure where the problem exactly was, but here is the working implementation:
myfft <- function(s){
N <- length(s)
if (N != 1){
t <- s
t[1:(N/2)] <- myfft(s[(1:(N/2))*2-1]) # 1 3 5 7 ...
t[(N/2+1):N] <- myfft(s[(1:(N/2))*2]) # 2 4 6 8 ...
s[1:(N/2)] <- t[1:(N/2)] + exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
s[(N/2+1):N] <- t[1:(N/2)] - exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
}
return(s)
}
The problem was with the following line
s[1:(N/2)] <- myfft(s[(1:(N/2))*2-1])
which was overwriting part of the untransformed values that were needed on the subsequent line:
s[(N/2+1):N] <- myfft(s[(1:(N/2))*2])
For example, when N=4, the second call to myfft uses s[2] and s[4], but the assignment from the first call to myfft writes into s[1] and s[2] (thus overwriting the required original value in s[2]).
Your solution of copying the entire array prevents this overwrite.
An alternate solution commonly used is to copy the even and odd parts separately:
myfft <- function(s){
N <- length(s)
if (N != 1){
odd <- s[(1:(N/2))*2-1]
even <- s[(1:(N/2))*2]
s[1:(N/2)] <- myfft(odd)
s[(N/2+1):N] <- myfft(even)
s[1:(N/2)] <- t[1:(N/2)] + exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
s[(N/2+1):N] <- t[1:(N/2)] - exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
}
return(s)
}

Non-comformable arguments in R

I am re-writting an algorithm I did in C++ in R for practice called the Finite Difference Method. I am pretty new with R so I don't know all the rules regarding vector/matrix multiplication. For some reason I am getting a non-conformable arguments error when I do this:
ST_u <- matrix(0,M,1)
ST_l <- matrix(0,M,1)
for(i in 1:M){
Z <- matrix(gaussian_box_muller(i),M,1)
ST_u[i] <- (S0 + delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
ST_l[i] <- (S0 - delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
}
I get this error:
Error in sqrt(T) %*% Z : non-conformable arguments
Here is my whole code:
gaussian_box_muller <- function(n){
theta <- runif(n, 0, 2 * pi)
rsq <- rexp(n, 0.5)
x <- sqrt(rsq) * cos(theta)
return(x)
}
d_j <- function(j, S, K, r, v,T) {
return ((log(S/K) + (r + (-1^(j-1))*0.5*v*v)*T)/(v*(T^0.5)))
}
call_delta <- function(S,K,r,v,T){
return (S * dnorm(d_j(1, S, K, r, v, T))-K*exp(-r*T) * dnorm(d_j(2, S, K, r, v, T)))
}
Finite_Difference <- function(S0,K,r,sigma,T,M,delta_S){
ST_u <- matrix(0,M,1)
ST_l <- matrix(0,M,1)
for(i in 1:M){
Z <- matrix(gaussian_box_muller(i),M,1)
ST_u[i] <- (S0 + delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
ST_l[i] <- (S0 - delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
}
Delta <- matrix(0,M,1)
totDelta <- 0
for(i in 1:M){
if(ST_u[i] - K > 0 && ST_l[i] - K > 0){
Delta[i] <- ((ST_u[i] - K) - (ST_l[i] - K))/(2*delta_S)
}else{
Delta <- 0
}
totDelta = totDelta + exp(-r*T)*Delta[i]
}
totDelta <- totDelta * 1/M
Var <- 0
for(i in 1:M){
Var = Var + (Delta[i] - totDelta)^2
}
Var = Var*1/M
cat("The Finite Difference Delta is : ", totDelta)
call_Delta_a <- call_delta(S,K,r,sigma,T)
bias <- abs(call_Delta_a - totDelta)
cat("The bias is: ", bias)
cat("The Variance of the Finite Difference method is: ", Var)
MSE <- bias*bias + Var
cat("The marginal squared error is thus: ", MSE)
}
S0 <- 100.0
delta_S <- 0.001
K <- 100.0
r <- 0.05
sigma <- 0.2
T <- 1.0
M <- 10
result1 <- Finite_Difference(S0,K,r,sigma,T,M,delta_S)
I can't seem to figure out the problem, any suggestions would be greatly appreciated.
In R, the %*% operator is reserved for multiplying two conformable matrices. As one special case, you can also use it to multiply a vector by a matrix (or vice versa), if the vector can be treated as a row or column vector that conforms to the matrix; as a second special case, it can be used to multiply two vectors to calculate their inner product.
However, one thing it cannot do is perform scalar multipliciation. Scalar multiplication of vectors or matrices always uses the plain * operator. Specifically, in the expression sqrt(T) %*% Z, the first term sqrt(T) is a scalar, and the second Z is a matrix. If what you intend to do here is multiply the matrix Z by the scalar sqrt(T), then this should just be written sqrt(T) * Z.
When I made this change, your program still didn't work because of another bug -- S is used but never defined -- but I don't understand your algorithm well enough to attempt a fix.
A few other comments on the program not directly related to your original question:
The first loop in Finite_Difference looks suspicious: guassian_box_muller(i) generates a vector of length i as i varies in the loop from 1 up to M, and forcing these vectors into a column matrix of length M to generate Z is probably not doing what you want. It will "reuse" the values in a cycle to populate the matrix. Try these to see what I mean:
matrix(gaussian_box_muller(1),10,1) # all one value
matrix(gaussian_box_muller(3),10,1) # cycle of three values
You also use loops in many places where R's vector operations would be easier to read and (typically) faster to execute. For example, your definition of Var is equivalent to:
Var <- sum((Delta - totDelta)^2)/M
and the definitions of Delta and totDelta could also be written in this simplified fashion.
I'd suggest Googling for "vector and matrix operations in r" or something similar and reading some tutorials. Vector arithmetic in particular is idiomatic R, and you'll want to learn it early and use it often.
You might find it helpful to consider the rnorm function to generate random Gaussians.
Happy R-ing!

How to speed up simulation in R

Below I have a code which attempts to find an unknown equilibrium price distribution (I hope the details are not important - this is a game-theoretic economics problem) using a crude evolutionary algorithm (as found in the main loop).
However, the code runs relatively slowly, and I'm not sure how to make it more efficient. I suspect that the function "profit_p1" could be written in a vector based manner such that the function "profit_all" using sapply can be avoided.
I'm not a programmer and mainly use R for writing quick simulations, so please bear with me. If anybody could give me a hint, I would greatly appreciate it. Many thanks in advance!
al <- 0.4 #Parameter constraints: al >= ah > 0; al + ah < 1
ah <- 0.3 #
R0 <- al*(1-al)/(ah*(1-ah)) #Calculate relevant boundaries for R=PH/PL: R0 and R1
R1 <- (1-al)/ah #
grid <- seq(0.01, 1, 0.01) #Sets up a grid on the relevant interval [0,1] (100 points currently)
iter <- 10000000 #Maximal number of iterations
l <- length(grid) #Calculate number of points in grid
pfreq <- rep(1/l, l) #Initial guess for (symmetric) first-period price distribution: uniform over grid
#--------------------
profit_p1 <- function(p1) #Function that gives expected profit for arbitrary price in grid
{
x <- sum(pfreq[grid < p1/R0])*ah*p1 +
(1-al)*al/(1-ah) * sum(pfreq[(grid >= p1/R0 & grid < p1)]*grid[(grid >= p1/R0 & grid < p1)]) +
(al+ah)/2 * p1 * sum(pfreq[abs(grid-p1)<0.0001]) +
al*p1*sum(pfreq[grid > p1 & grid <= p1*R0]) +
(1-ah)*ah/(1-al) * sum(pfreq[(grid > p1*R0 & grid <= p1*R1)]*grid[(grid > p1*R0 & grid <= p1*R1)]) +
(1-ah)*p1 * sum(pfreq[grid > p1*R1])
return(x)
}
profit_all <- function() #Function that gives expected profit for all prices in grid
{
return(sapply(grid,profit_p1))
}
#--------------------
for(count in 1:iter) #Main loop
{
pfreq[which.max(profit_all())] <- pfreq[which.max(profit_all())] + 0.0001 # The freq. of the grid point which yields the highest expected profit is increased slightly
pfreq <- pfreq / sum(pfreq) # But of course, the total probability mass must sum up to 1
#--------------------
if(count %% 100 == 0) # Display price distribution and expected profits after every 100 iterations
{
plot(grid, profit_all(), ylim = c(0,0.5), type="l")
lines(grid, pfreq / max(pfreq)*0.4, col="orange")
}
#--------------------
}

Resources