I'm dealing right now with a valuation of Option prices for my university thesis.
We need to program some things in R. It's the first time I'm working with a programming software like R. I've been doing this for the last 2 weeks and this is where I went so far:
s <- 120
#Value of the stock today
sd <- 0.1
#standard deviation
d <- 0.003
#Drift
N <- 365
T <-1
dt <-T/N
t <- seq(0,T, length=N+1)
W <- c(0, cumsum(sqrt(dt)*rnorm(N)))
#plot( t, W, type="l", main="Wiener process", ylim=c(-1,1))
S <- s*exp(d+sd*W)
S
This is a simple generalized Wiener process which I want to turn into a Monte Carlo simulation.
For S there are now 366 (N+1) Values of the Stock path. What I need is a "for loop" which takes the last Value of S and allocates it into a vector (list vector), so that I can run the loop for example 10000 times, collect every last Value of S and get the average of the vector.
I have no idea how I can program such a for loop.
I would really appreciate if you could help me or give me some good hints.
Greetings from Germany
Christian
I never studied Wiener Processes, but I think this would be a simple outline of the code you're trying to achieve:
stock_prices <- s #Initialise vector of stock prices
numIter <- 10^4 #Set number of iterations in the for loop
for(i in 1:numIter) {
s <- stock_prices[i] #This is the current stock price (for ith iteration / time step)
#Calculate the next stock price here, call it next_price
#Add price of next iteration / time step to your vector:
stock_prices <- c(stock_prices, next_price)
}
stock_prices will be a vector of the 10,000 stock prices you simulated.
I don't know how you calculate the next stock price from S, but if you draw from the values of S randomly, then it might be useful to check out the function sample (type ?sample for help on it).
Hope that helps
If you just want to run code repeatedly, putting it in a function is nice (but not absolutely necessary). I will refer to all the code in your question as <your code>.
To make a function that runs your code,
my_function = function() {
<your code>
}
The function will, by default, return its last line, in this case S. You only want the last element of S, tail(S, 1). So we can modify the function to return only that:
my_function = function() {
<your code>
return(tail(S, 1))
}
We can then call it in a for loop n times and assign the result. It is best to pre-allocate the vector for the results so that an appropriately sized block of memory can be set aside for it up front:
n = 10000
results = rep(NA, n)
for (i in 1:n) {
results[i] <- my_function()
}
This is equivalent to
n = 10000
results = rep(NA, n)
for (i in 1:n) {
<your code>
results[i] <- tail(S, 1)
}
And, for that matter, it is also equivalent to
results = replicate(n, my_function())
which is a handy shortcut.
If you want to be fancy, you could parameterize your function:
my_nice_function = function(s = 120, sd = 0.1, d = 0.003, N = 365) {
T <- 1
dt <- T / N
t <- seq(0, T, length = N + 1)
W <- c(0, cumsum(sqrt(dt) * rnorm(N)))
S <- s * exp(d + sd * W)
return(tail(S, 1))
}
Now my_nice_function has default values as in your code, but you can easily adjust them, e.g., to run the 50 simulations with sd = 0.2 you can do this:
replicate(50, my_nice_function(sd = 0.2))
Related
I have a long vector, say x with length of 1e6 and a same length weight vector, w. I want to find a small number (i.e., a scalar value) which will be added to each element of x, and make my expression value, shown in the code part below, as small as possible.
I tried using a vector from -1 to 1 by = 0.001 and using for loop to get the minimal result of my expression, but my solution is a good way to do since I will repeat the same operation 100 times or more (sometimes, the x length arrive to 1e7 or more), which take long time to finish.
getSigmoid <- function(x) {return(1 / (1 + exp(-x)))}
x <- rnorm(1e6)
w <- rnorm(1e6)
pool <- seq(-1, 1, by = 0.001)
npool <- length(pool)
result <- rep(NA, times = npool)
stime <- Sys.time()
for (i in 1:npool) {
cat("i: ", i, "/", npool, "\n")
flush.console()
result[i] <- abs(sum(getSigmoid(x + pool[i]) * w) / sum(w) - 0.5)
}
etime <- Sys.time()
(spenttime <- etime - stime)
idx_min <- which.min(result)
cat("minimal value is: ", result[idx_min], "\n")
cat("solution is: ", pool[idx_min], "\n")
I hope to get a better solution (i.e., improve the computation speed) for my question. I tried to think the vecterization idea I can not figure out. I understand parallel is a method to try, but actually the code is already in the parallel function (i.e, nested parallel may be more difficult). So if someone can figure out a method which is based on the vectorization or other, that will be very helpful.
Instead of calculating the entire vector space and finding the minimum, you will need to use a better search method or an optimization routine.
Base R has the function optimize which can do this.
set.seed(1234)
x <- rnorm(1e6)
w <- rnorm(1e6)
stime <- Sys.time()
sumw<-sum(w) #Perform the calculation once and store
#create functions:
getSigmoid <- function(x) {return(1 / (1 + exp(-x)))}
f <-function(pool) {
abs(sum(getSigmoid(x + pool) * w) / sumw - 0.5)
}
#optimize function performs the search
print(optimize(f, c(-1, 1), tol = 0.00001))
etime <- Sys.time()
print(spenttime <- etime - stime)
Using the built-in function improves the resolution of the result and greatly improved the performance. Your algorithm took about 30 seconds on my machine, the optimize function took about 0.3 secs, about 100x improvement.
The another alternative is the non-linear minimization function: nlm. Same code above but substitute nlm(f, 0) in for the optimize function.
The following problem tells us to generate a Poisson process step by step from ρ (inter-arrival time), and τ (arrival time).
One of the theoretical results presented in the lectures gives the
following direct method for simulating Poisson process:
• Let τ0 = 0.
• Generate i.i.d. exponential random variables ρ1, ρ2, . . ..
• Let τn = ρ1 + . . . + ρn for n = 1, 2, . . . .
• For each k = 0, 1, . . ., let
Nt = k for τk ≤ t < τk+1.
Using this method, generate a realization of a Poisson process (Nt)t with λ = 0.5 on the interval [0, 20].
Generate 10000 realizations of a Poisson process (Nt)t with λ = 0.5 and use your results to estimate E(Nt) and Var(Nt). Compare the estimates
with the theoretical values.
My attempted solution:
First, I have generated the values of ρ using rexp() function in R.
rhos <-function(lambda, max1)
{
vec <- vector()
for (i in 1:max1)
{
vec[i] <- rexp(0.5)
}
return (vec)
}
then, I created τs by progressive summing of ρs.
taos <- function(lambda, max)
{
rho_vec <- rhos(lambda, max)
#print(rho_vec)
vec <- vector()
vec[1] <- 0
sum <- 0
for(i in 2:max)
{
sum <- sum + rho_vec[i]
vec[i] <- sum
}
return (vec)
}
The following function is for finding the value of Nt=k when the value of k is given. Say, it is 7, etc.
Ntk <- function(lambda, max, k)
{
tao_vec <- taos(lambda, max)
val <- max(tao_vec[tao_vec < k])
}
y <- taos(0.5, 20)
x <- seq(0, 20-1, by=1)
plot(x,y, type="s")
Output:
As you can see, the plot of the Poisson process is blank rather than a staircase.
If I change rexp to exp, I get the following output:
.. which is a staircase function but all steps are equal.
Why is my source code not producing the expected output?
It looks like you're using max1 to indicate how many times to sample the exponential distribution in your rhos function. I would recommend something like this:
rhosGen <- function(lambda, maxTime){
rhos <- NULL
i <- 1
while(sum(rhos) < maxTime){
samp <- rexp(n = 1, rate = lambda)
rhos[i] <- samp
i <- i+1
}
return(head(rhos, -1))
}
This will continue to sample from the exponential until the sum of these holding times is larger than the length of the given interval. head the removes the last sample so that all of the events that we keep track of definitely occur in our time interval of interest.
From here you have to generate the taos by summing the previous holding times (rhos):
taosGen <- function(lambda, maxTime){
rhos <- rhosGen(lambda, maxTime)
taos <- NULL
cumSum <- 0
for(i in 1:length(rhos)){
taos[i] <- sum(rhos[1:i])
}
return(taos)
}
Now that you have the taos we know at what time each event in the time interval (0,maxTime) occurs. This leads us to generating the associated Poisson Process by finding the value of the Nt for each t in the time interval:
ppGen <- function(lambda, maxTime){
taos <- taosGen(lambda, maxTime)
pp <- NULL
for(i in 1:maxTime){
pp[i] <- sum(taos <= i)
}
return(pp)
}
This generates the value of the Poisson Process at each integer time in the interval. I suspect that part of your issue was trying to put the tao values on the y-axis instead of the count of events that had occurred already. The following code worked for me to produce a random looking stair case, similar to your example.
y <- ppGen(0.5, 20)
x <- seq(0, 20-1, by=1)
plot(x,y, type="s")
Here's another possible implementation. The idea is to generate a vector of wait times (tau), and plot that against the list of events we're waiting for (max1)
poi.process <- function(lambda,n){
# initialize vector of total wait time for the arrival of each event:
s<-numeric(n+1)
# set S_0 = 0
s[1] <-0
# generate vector of iid Exp random variables:
x <-replicate(n,rexp(1,lambda))
# assign wait time to vector s in for loop:
for (k in 1:n){
s[k+1] <-sum(x[1:k])
}
# return vector of wait time
return(s)
}
Plotting it using stepfun will get us something like this:
n<-20
lambda <-3
# simulate list of wait time:
s_list <-poi.process(lambda,n)
# plot function:
plot(stepfun(0:(n-1), s_list),
do.points = TRUE,
pch = 16,
col.points = "red",
verticals = FALSE,
main = 'Realization of a Poisson process with lambda = 3',
xlab = 'Time of arrival',
ylab = 'Number of arrivals')
Sample Poisson process:
I have an event A that is triggered when the majority of coin tosses in a series of tosses comes up heads. I have an unfair coin and I'd like to see how the likelihood of A changes as the number of tosses change and the probability in each toss changes.
This is my function assuming 3 tosses
n <- 3
#victory requires majority of tosses heads
#tosses only occur in odd intervals
k <- seq(n/2+.5,n)
victory <- function(n,k,p){
for (i in p) {
x <- 0
for (i in k) {
x <- x + choose(n, k) * p^k * (1-p)^(n-k)
}
z <- x
}
return(z)
}
p <- seq(0,1,.1)
victory(n,k,p)
My hope is the victory() function would:
find the probability of each of the outcomes where the majority of tosses are heads, given a particular value p
sum up those probabilities and add them to a vector z
go back and do the same thing given another probability p
I tested this with n <- 3, k <- c(2,3) and p <- (.5,.75) and the output was 0.75000, 0.84375. I know that the output should've been 0.625, 0.0984375.
I wasn't able to get exactly the result you wanted, but maybe can help you along a bit.
When looping in R the vector you are looping through remains unchanged and value you are using to loop changes. For example see the differences in these loops:
test <- seq(0,1,length.out = 5)
for ( i in test){
print(test)
}
for ( i in test){
print(i)
}
for ( i in 1:length(test)){
print(test[i])
}
when you are iterating you are firstly setting i to the first number in p, then to the first number in k and then using the unchanged vectors.
You are also assigning to z in the first loop of p and then writing over it in the second loop.
Try using the below - I am still not getting the answer you say but it might help you find where the error is (printing out along the way or using debug(victory) might also be helpful
victory <- function(n,k,p){
z <-list()
for (i in 1:length(p)) {
x <- 0
for (j in 1:length(k)) {
x <- x + choose(n, k[j]) * p[i]^k[j] * (1-p[i])^(n-k[j])
}
z[i] <- x
}
return(z)
}
I don't have background in programming (except from wrestling with R to get things done), and I'm trying to verbalize what the formula for the greater common divisor in the R {numbers} package is trying to do at each step. I need help with understanding the flow of steps within the function:
function (n, m)
{
stopifnot(is.numeric(n), is.numeric(m))
if (length(n) != 1 || floor(n) != ceiling(n) || length(m) !=
1 || floor(m) != ceiling(m))
stop("Arguments 'n', 'm' must be integer scalars.")
if (n == 0 && m == 0)
return(0)
n <- abs(n)
m <- abs(m)
if (m > n) {
t <- n
n <- m
m <- t
}
while (m > 0) {
t <- n
n <- m
m <- t%%m
}
return(n)
}
<environment: namespace:numbers>
For instance, in the if (m > n) {} part the n becomes t and ultimately it becomes m? I'm afraid to ask, because it may be painfully obvious, but I don't know what is going on. The same apply to, I guess, he else part of the equation with %% being perhaps modulo.
What it says is:
Stop if either m or n are not numeric, more than one number, or have decimals, and return the message, "Arguments 'n', 'm' must be integer scalars."
If they both are zero, return zero.
Using absolute values from now on.
Make sure that n > m because of the algorithm we'll end up applying in the next step. If this is not the case flip them: initially place n in a temporary variable "t", and assign m to n, so that now the larger number is at the beginning of the (n, m) expression. At this point both the initial (n, m) values contain m. Finish it up by retrieving the value in the temporary variable and assigning it to m.
Now they apply the modified Euclidean algorithm to find the GCD - a more efficient version of the algorithm that shortcuts the multiple subtractions, instead replacing the larger of the two numbers by its remainder when divided by the smaller of the two.
The smaller number at the beginning of the algorithm will end up being the larger after the first iteration, therefore we'll assign it to n to get ready for the second iteration. To do so, though, we need to get the current n out of the way by assigning it to the temporary variable t. After that we get the modulo resulting from dividing the original larger number (n), which now is stored in t, by the smaller number m. The result will replace the number stored in m.
As long as there is a remainder (modulo) the process will go on, this time with the initial smaller number, m playing the role of the big guy. When there is no remainder, the smaller of the numbers in that particular iteration is returned.
ADDENDUM:
Now that I know how to read this function, I see that it is limited to two numbers in the input to the function. So I entertained myself putting together a function that can work with three integers in the input:
require(numbers)
GCF <- function(x,y,z){
tab.x <- tabulate(primeFactors(x))
tab.y <- tabulate(primeFactors(y))
tab.z <- tabulate(primeFactors(z))
max.len <- max(length(tab.x), length(tab.y), length(tab.z))
tab_x = c(tab.x, rep(0, max.len - length(tab.x)))
tab_y = c(tab.y, rep(0, max.len - length(tab.y)))
tab_z = c(tab.z, rep(0, max.len - length(tab.z)))
GCD_elem <- numeric()
for(i in 1:max.len){
GCD_elem[i] <- min(tab_x[i], tab_y[i], tab_z[i]) * i
}
GCD_elem <- GCD_elem[!GCD_elem==0]
GrCD <- prod(GCD_elem)
print(GrCD)
}
Also for the LCM:
LCM <- function(x,y,z){
tab.x <- tabulate(primeFactors(x))
tab.y <- tabulate(primeFactors(y))
tab.z <- tabulate(primeFactors(z))
max.len <- max(length(tab.x), length(tab.y), length(tab.z))
tab_x = c(tab.x, rep(0, max.len - length(tab.x)))
tab_y = c(tab.y, rep(0, max.len - length(tab.y)))
tab_z = c(tab.z, rep(0, max.len - length(tab.z)))
LCM_elem <- numeric()
for(i in 1:max.len){
LCM_elem[i] <- i^(max(tab_x[i], tab_y[i], tab_z[i]))
}
LCM_elem <- LCM_elem[!LCM_elem==0]
LCM <- prod(LCM_elem)
print(LCM)
}
I would like to implement a simulation program, which requires the following structure:
It has a for loop, the program will generate an vector in each iteration. I need each generated vector is appended to the existing vector.
I do not how how to do this in R. Thanks for the help.
These answers work, but they all require a call to a non-deterministic function like sample() in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for loop. The trick is to use the n argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n calls, which matters if your n is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)
steps <- 30
n <- 100
directions <- c(-1, 1)
results <- vector('list', n)
for (i in seq_len(n)) {
walk <- numeric(steps)
for (s in seq_len(steps)) {
walk[s] <- sample(directions, 1)
}
results[[i]] <- sum(walk)
}
We can rewrite this with one call to sample():
all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)
Proof of speed increase (n=10000):
> system.time({
+ for (i in seq_len(n)) {
+ walk <- numeric(steps)
+ for (s in seq_len(steps)) {
+ walk[s] <- sample(directions, 1)
+ }
+ results[[i]] <- sum(walk)
+ }})
user system elapsed
4.231 0.332 4.758
> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
user system elapsed
0.010 0.001 0.012
If your simulation needs just one random variable per simulation function call, use sapply(), or better yet the multicore package's mclapply(). Revolution Analytics's foreach package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).
Take home points:
Preallocate with numeric(n) or vector('list', n)
Push invariant code out of for loops. Cleverly push stochastic functions out of code with their n argument.
Try hard for sapply() or lapply(), or better yet mclapply.
Don't use x <- c(x, rnorm(100)). Every time you do this, a member of R-core kills a puppy.
Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.
n <- 10
start <- vector("list", n)
for (i in 1:n) {
a[[i]] <- sample(10)
}
start <- unlist(start)
You could do it the old nasty way. This may be slow for larger vectors.
start <- c()
for (i in 1:n) {
add <- sample(10)
start <- c(start, add)
}
x <- rnorm(100)
for (i in 100) {
x <- c(x, rnorm(100))
}
This link should be useful: http://www.milbo.users.sonic.net/ra/
Assuming your simulation function -- call it func -- returns a vector with the same length each time, you can store the results in the columns of a pre-allocated matrix:
sim1 <- function(reps, func) {
first <- func()
result <- matrix(first, nrow=length(first), ncol=reps)
for (i in seq.int(from=2, to=reps - 1)) {
result[, i] <- func()
}
return(as.vector(result))
}
Or you could express it as follows using replicate:
sim2 <- function(reps, func) {
return(as.vector(replicate(reps, func(), simplify=TRUE)))
}
> sim2(3, function() 1:3)
[1] 1 2 3 1 2 3 1 2 3