Rcpp set the RNG state to a former state - r

My question is a follow-up of http://rcpp-devel.r-forge.r-project.narkive.com/qJMEsvOK/setting-the-r-random-seed-from-rcpp.
I want to be able to set the RNG state to a former state from within C++.
For example, I would like the following code to produce a matrix where each column contains the same realizations of Gamma random variables.
cppFunction('NumericMatrix rgamma_reset_seed(int n, double shape, double rate){
RNGScope rngscope;
Environment g = Environment::global_env();
Environment::Binding RandomSeed = g[".Random.seed"];
IntegerVector someVariable = RandomSeed;
NumericMatrix results(n, 2);
results(_,0) = rgamma(n, shape, 1/rate);
RandomSeed = someVariable;
results(_,1) = rgamma(n, shape, 1/rate);
return results;
}')
m <- rgamma_reset_seed(1000, 1.2, 0.8)
par(mfrow = c(2, 1))
plot(m[,1])
plot(m[,2])
But it does not seem to work. In R, I can achieve the result by lines such as
.Random.seed <- x # reset the state to x
x <- .Random.seed # store the current state
Am I missing something obvious? Any help would be much appreciated!

This may not work (easily) -- there is some language in Writing R Extension which states that you cannot set the seed from the C level API.
Now, you could cheat:
Init RNG from R
Do some work, make sure this is wrapped by RNGScope as our code does anyway.
Now cheat and use Rcpp::Function() to invoke set.seed().
Consider whether to go back to step 2 or to finish.

Related

When do I have to set a seed?

I think this is a very basic question
I am doing simulations, so I make functions to recreate for example a random walk, which mathematically takes this form:
so to simulate it I make my function:
ar_1 <- function(iter, y0, sigma_e){
e <- rnorm(iter, sd = sigma_e)
y <- numeric(iter)
y[1] <- y0
for(t in 2:iter){
y[t] = y[t-1]+e[t]
}
result <- data.frame(iteration = seq(1,iter), y = y)
print(plot(result$iteration, result$y, type="l"))
return(result)
}
try1 <- ar_1(iter = 100, y0 = 2, sigma_e = 0.0003)
So the thing is the e vector takes random numbers.
I want to replicate the same graph and values wherever, so I know I gotta use a seed.
So my question is: does the seed goes inside the function or at the very start of the script?
Furthermore, I would want to know why.
If you set.seed once at the top of the script, the seed will remain set until the first call to rnorm. Subsequent calls to functions that require a random seed will not use the initial seed.
So really the answer is: do you intend to call the function more than once? If so, then set the seed inside the function.
Note that you do not need a for loop in your function. Because R is vectorized, loops can ussually be avoided. Random walk values can be calculated using the base R cumsum function. For example:
set.seed(7)
y1 <- pi
rand_vals <- rnorm(10, 0, 5)
path <- c(y1, rand_vals)
walk <- cumsum(path)
rand_vals
[1] 11.4362358 -5.9838584 -3.4714626 -2.0614648 -4.8533667 -4.7363997 3.7406967 -0.5847761 0.7632881 10.9498905
path
[1] 3.1415927 11.4362358 -5.9838584 -3.4714626 -2.0614648 -4.8533667 -4.7363997 3.7406967 -0.5847761 0.7632881
[11] 10.9498905
walk
[1] 3.141593 14.577828 8.593970 5.122507 3.061043 -1.792324 -6.528724 -2.788027 -3.372803 -2.609515 8.340376

Optimization in R: what is the difference between a constructor and a wrapper?

I am going over the following code from Roger Peng's course; the author is building a function to calculate the negative log-likelihood, make.NegLogLik, but he does so using what he calls a "constructor function". Peng uses the constructor function because make.NegLogLik will later be optimized, and in this way, it will easier to hold a parameter constant.
I can see that the constructor function "contains" another function, that is, the function actually calculating the negative log-likelihood.
make.NegLogLik <- function(data, fixed = c(FALSE, FALSE)) {
params <- fixed
function(p) {
params[!fixed] <- p
mu <- params[1]
sigma <- params[2]
## Calculate the Normal density
a <- -0.5*length(data)*log(2*pi*sigma^2)
b <- -0.5*sum((data-mu)^2) / (sigma^2)
-(a + b)
}
}
set.seed(1)
normals <- rnorm(100, 1, 2)
nLL <- make.NegLogLik(normals)
nLL
Created on 2021-03-23 by the reprex package (v0.3.0)
In that, it looks similar to a wrapper, for example:
toy_function <- function(x,y,z,w) {
sum_coordinates <- sum(x+y) + sum(z+w)
distance <- w-y
penalty <- sum_coordinates + distance
return(penalty)
}
wrapper = function(args){
toy_function(x = args[1],
y = args[2],
z = args[3],
w = args[4])
}
# the function takes a vector
vec <- c(1,1,0,0)
wrapper(vec)
#> [1] 1
Created on 2021-03-23 by the reprex package (v0.3.0)
Question
I have looked up "constructor function" in R but all the results coming up refer to object-oriented programming. I was wondering what "constructor function" means in the context of the optimization example above.
In the first example, the return value of the function is another function. So make.NegLogLik() constructs another function.
In the second example, the return value of wrapper() is the return value of toy_function(). So you can use it directly where you would use toy_function(). Usually this is just to change the interface or set defaults for a function you don't own.
In other words, usage looks something like this. Compare the steps for the constructor vs the steps for the wrapper. For the constructor, you must actually call the returned function to get the value you want.
my_negloglik_fun <- make.NegLogLik(data = some_data)
negloglik <- my_negloglik_fun(c(param1, param2))
penalty <- wrapper(args = list(x, y, z, w))
This has a number of use cases; one example might be that you want to iterate a function over many sets of inputs. Using a constructor to get that function means that you don't have to specify other arguments when iterating. For the function constructed by make.NegLogLik(), you can test out many options for mu and sigma without having to pass the data in each time (and in fact, that is what functions like optim() want to do)

Why set.seed() affects sample() in R

I always thought set.seed() only makes random variable generators (e.g., rnorm) to generate a unique sequence for any specific set of input values.
However, I'm wondering, why when we set the set.seed(), then the function sample() doesn't do its job correctly?
Question
Specifically, given the below example, is there a way I can use set.seed before the rnorm but sample would still produce new random samples from this rnorm if sample is run multiple times?
Here is an R code:
set.seed(123458)
x.y = rnorm(1e2)
sampled = sample(x = x.y, size = 20, replace = TRUE)
plot(sampled)
As per the help file at ?set.seed
"If called with seed = NULL it re-initializes (see ‘Note’) as if no
seed had yet been set."
So, since rnorm and sample are both affected by set.seed(), you can do:
set.seed(639245)
rn <- rnorm(1e2)
set.seed(NULL)
sample(rn,5)
Instead of resetting the seed with NULL, I think it makes more sense to save the current state and restore it.
x <- .Random.seed
set.seed(639245)
rn <- rnorm(1e2)
.Random.seed <- x
sample(rn,5)

DEoptim does not return optimal parameters

I am trying to use DEoptim to optimize the parameters of the Heston pricing model (NMOF package). My goal is to minimize the difference between the real option price and the heston price. However, when running my code, DEoptim does not save the best result but always displays the value that is obtained by using the initial parameters, not the optimized ones. Unfortunately, I'm totally new to R (and any kind of programming) and thus I cannot seem to fix the problem.
My data, for one exemplary subset of an option looks like this.
#Load data
#Real option price
C0116_P=as.vector(c(1328.700000, 1316.050000, 1333.050000, 1337.900000, 1344.800000))
#Strike price
C0116_K=as.vector(c(500, 500, 500, 500, 500))
#Time to maturity in years
C0116_T_t=as.vector(c(1.660274, 1.657534, 1.654795, 1.652055, 1.649315))
#Interest rate percentage
C0116_r=as.vector(c(0.080000, 0.080000, 0.090000, 0.090000, 0.090000))
#Dividend yield percentage
C0116_DY=as.vector(c(2.070000, 2.090000, 2.070000, 2.070000,2.060000))
#Price underlying
C0116_SP_500_P=as.vector(c(1885.08, 1872.83, 1888.03, 1892.49, 1900.53))
In the next step, I want to define the function I want to minimize (difference between real and heston price) and set some initial parameters. To optimize, I am running a loop which unfortunately at the end only returns the difference between the real option price and the heston price using the initial parameters as a best value and not the actual parameters that minimize the difference.
#Load packages
require(NMOF)
require(DEoptim)
#Initial parameters
v0=0.2
vT=0.2
rho=0.2
k=0.2
sigma=0.2
#Define function
error_heston<-function(x)
{error<-P-callHestoncf(S, X, tau, r, q, v0, vT, rho, k, sigma)
return(error)}
#Run optimization
outDEoptim<-matrix()
for (i in 1:5)
{
#I only want the parameters v0, vT, rho, k and sigma to change. That is why I kept the others constant
lower<-c(C0116_P[i],C0116_SP_500_P[i],C0116_K[i],C0116_T_t[i],C0116_r[i]/100,C0116_DY[i]/100,0.0001,0.0001,-1,0.0001,0.0001)
upper<-c(C0116_P[i],C0116_SP_500_P[i],C0116_K[i],C0116_T_t[i],C0116_r[i]/100,C0116_DY[i]/100,10,10,1,10,10)
outDEoptim<-(DEoptim(error_heston, lower, upper, DEoptim.control(VTR=0,itermax=100)))
print(outDEoptim$opti$bestval)
i=i+1
}
Any help is much appreciated!
One of the first problems is that your objective function only has one argument (the parameters to optimize), so all the others objects used inside the function must be looked up. It's better practice to pass them explicitly.
Plus, many of the necessary values aren't defined in your example (e.g. S, X, etc). All the parameters you want to optimize will be passed to your objective function via the first argument. It can help clarify things if you explicitly assign each element inside your objective function. So a more robust objective function definition is:
# Define objective function
error_heston <- function(x, P, S, K, tau, r, q) {
v0 <- x[1]
vT <- x[2]
rho <- x[3]
k <- x[4]
sigma <- x[5]
error <- abs(P - callHestoncf(S, K, tau, r, q, v0, vT, rho, k, sigma))
return(error)
}
Also note that I took the absolute error. DEoptim is going to minimize the objective function, so it would try to make P - callHestoncf() as negative as possible, when you want it to be close to zero instead.
You specified the box constraints upper and lower even for the parameters that don't vary. It's best to only have DEoptim generate a population for the parameters that do vary, so I removed the non-varying parameters from the box constraints. I also defined them outside the for loop.
# Only need to set bounds for varying parameters
lower <- c(1e-4, 1e-4, -1, 1e-4, 1e-4)
upper <- c( 10, 10, 1, 10, 10)
Now to the actual DEoptim call. Here is where you will pass the values for all the non-varying parameters. You set them as named arguments to the DEoptim call, as I've done below.
i <- 1
outDEoptim <- DEoptim(error_heston, lower, upper,
DEoptim.control(VTR=0, itermax=100), P = C0116_P[i], S = C0116_SP_500_P[i],
K = C0116_K[i], tau = C0116_T_t[i], r = C0116_r[i], q = C0116_DY[i])
I only ran one iteration of the for loop, because the callHestoncf() function frequently throws an error because the numerical integration routine fails. This stops the optimization. You should look into the cause of that, and ask a new question if you have trouble.
I also noticed you specified one of the non-varying inputs incorrectly. Your dividend yield percentages are 100 times too large. Your non-varying inputs should be:
# Real option price
C0116_P <- c(1328.70, 1316.05, 1333.05, 1337.90, 1344.80)
# Strike price
C0116_K <- c(500, 500, 500, 500, 500)
# Time to maturity in years
C0116_T_t <- c(1.660274, 1.657534, 1.654795, 1.652055, 1.649315)
# Interest rate percentage
C0116_r <- c(0.08, 0.08, 0.09, 0.09, 0.09)
# Dividend yield percentage
C0116_DY <- c(2.07, 2.09, 2.07, 2.07, 2.06) / 100
# Price underlying
C0116_SP_500_P <- c(1885.08, 1872.83, 1888.03, 1892.49, 1900.53)
As an aside, you should take a little time to format your code better. It makes it more readable, which should help you avoid typo-like errors.

Optimizing the code for error minimization

I have written the code below for minimization of error by changing the value of alpha (using iteration method).
set.seed(16)
npoints = 10000
Y = round(runif(npoints), 3)
OY = sample(c(0, 1, 0.5), npoints, replace = T)
minimizeAlpha = function(Y, OY, alpha) {
PY = alpha*Y
error = OY - PY
squaredError = sapply(error, function(x) x*x)
sse = sum(squaredError)
return(sse)
}
# # Iterate for 10000 values
alphas = seq(0.0001, 1, 0.0001)
sse = sapply(alphas, function(x) minimizeAlpha(Y, OY, x))
print(alphas[sse == min(sse)])
I have used sapply for basic optimization. But, if the number of points are more than 10000 this code is running forever. So, is there any better way of implementation or any standard techniques to optimize (like Bisection). If so can you please help me in optimizing the code.
Note: I need the value of alpha with at least 4 decimals.
Any help is appreciated.
Replacing sapply instead of for isn’t more efficient, that’s a misconception. It’s merely often simpler code.
However, you can actually take advantage of vectorisation in your code — and that would be faster.
For instance, sapply(error, function(x) x*x) can simply be replaced by x * x. The sum of squared errors of numbers in R is thus simply sum((OY - PY) ** 2).
Your whole function thus boils down to:
minimizeAlpha = function(Y, OY, alpha)
sum((OY - alpha * Y) ** 2)
This should be more efficient — but first and foremost it’s better code and more readable.

Resources