I (being an absolut beginner with R and programming) have to do some analysis with R for my thesis in finance :(
The purpose is to simulate data (stock prices) with a GBM and run over the results 2 trading strategies. Within the GBM I'll have to "play" with the veriables "r" and "sigma" (3 different values for each, thus 9 combinations). Each combination needs to be simulated 10000 times over a period T=10, N=250. To all these simulations 2 trading strategies have to be applied - MACD and RSI (within the TTR-package).
Now I'm facing an issue with writing the code :(
#Geometrical Brownian Motion
Sim <- GBM(x, r, sigma, T, N)
x <-100
r <-0
sigma <-1
T <- 10
N <- 250
#Additional info for RSI-strategy
retSim <- ROC(Sim)
SimRSI <- RSI(Sim, 14,)
SimRSI[is.na(SimRSI)] <- 0
#Create a vector for the results of the inner loop
portfolio <- rep(0:N)
portfolio[1] <- 100
runs <- 10000
#Creating vectors for final results of portfolio and simulation after 10000 runs (only the last value of each of the 10000 simulations and portfolio results of the strategy required)
resultsSimGBM <- rep(0:runs)
resultsRSIr1sig1 <- rep(0:runs)
#orders
buyRSI<-portfolio[i-1]*exp(retSim[i])
holdRSI<-portfolio[i-1]
#Simulation
portfolio[1]<-100
i <- 1
j <- 1
#Second loop
for(j in 0:runs){
#Simulation GBM
x <-100
r <-0
sigma <-1
T <- 10
N <- 250
Sim <- GBM(x, r, sigma, T, N)
retSim <- ROC(Sim)
SimRSI <- RSI(Sim, 14,)
SimRSI[is.na(SimRSI)] <- 0
portfolio[1] <- 100
#First loop
for(i in 2:length(Sim)){
#Apply RSI on GBM
buyRSI<-portfolio[i-1]*exp(retSim[i])
holdRSI<-portfolio[i-1]
if(SimRSI[i-1]<50 && SimRSI[i]>50){portfolio[i]=buyRSI}
if(SimRSI[i-1]>50 && SimRSI[i]<50){portfolio[i]=holdRSI}
if(SimRSI[i-1]<50 && SimRSI[i]<50){portfolio[i]=holdRSI}
if(SimRSI[i-1]>50 && SimRSI[i]>50){portfolio[i]=buyRSI}
i <- i+1
}
resultsRSI[j] <- portfolio[N]
resultsSimGBM[j] <- Sim[N]
j <- j+1
}
Anyway, this is what I have until now and it seems to work. However, into the first (inner) loop, I need to include also the second strategy (which until now, singled out) looked following :
#MACD strategy
portfolioMACD[1]<-100
i <- 1
j <- 1
for(j in 0:runs){
Sim <- BMSim
retSim <- ROC(Sim)
SimMACD <- MACD(Sim, 12, 26, 9, myType="EMA")
DataSimMACD <- data.frame(SimMACD)
DataSimMACD$macd[is.na(DataSimMACD$macd)] <- 0
DataSimMACD$signal[is.na(DataSimMACD$signal)] <- 0
for(i in 2:length(Sim)){
buyMACD<-portfolioMACD[i-1]*exp(retSim[i])
sellMACD<-portfolioMACD[i-1]
holdMACD<-portfolioMACD[i-1]*exp(retSim[i])
if(DataSimMACD$macd[i-1]<DataSimMACD$signal[i-1] && DataSimMACD$macd[i]>DataSimMACD$signal[i]){portfolioMACD[i]=buyMACD}
if(DataSimMACD$macd[i-1]>DataSimMACD$signal[i-1] && DataSimMACD$macd[i]<DataSimMACD$signal[i]){portfolioMACD[i]=sellMACD}
if(DataSimMACD$macd[i-1]>DataSimMACD$signal[i-1] && DataSimMACD$macd[i]>DataSimMACD$signal[i]){portfolioMACD[i]=holdMACD}
if(DataSimMACD$macd[i-1]<DataSimMACD$signal[i-1] && DataSimMACD$macd[i]<DataSimMACD$signal[i]){portfolioMACD[i]=sellMACD}
if(DataSimMACD$macd[i]==DataSimMACD$signal[i]){portfolioMACD[i]=sellMACD}
if(DataSimMACD$macd[i-1]==DataSimMACD$signal[i-1] && DataSimMACD$macd[i]!=DataSimMACD$signal[i]){portfolioMACD[i]=buyMACD}
i <- i+1
}
resultsMACD[j] <- portfolioMACD[length(Sim)]
j <- j+1
}
BASICALLY:
1-One Brownian motion has to consist of 2500 elements, to which both trading strategies have to be applied separately
2-this whole procedure has to be repeated 10000 times for each out of 9 combinations of variables r and sigma (r1sigma1, r1sigma2, r1sigma3,.....,r3sigma3) (this I don't have included in my code yet - no clue how to construct those 2 loops around it all...) :(
3-the endresult should be a 10000x27 matrix with 10000rows (for amount of runs) and 27 colums (9x GBM, RSI, MACD) filled only with the 2500th (endvalue) of each simulation (from point 1.) --> how to do it?
SOS! Could someone of you PLEASE PLEASE PLEASE help me with this mess? I'm completely lost and it's my graduation paper -.-
Every help will be highly praised and deeply appreciated!
Thanks in advance and sorry for the long post.
Cheers from Berlin, Ana :)
EDIT AND ANOTHER SIMPLIFIED EXAMPLE
library(sde)
#Vectors for results
Returns <- rep(0:N)
LogReturns <- rep(0:N)
Simulation <- rep(0:N)
ResultsSimulation <- rep(0:runs)
ResultsReturns <- rep(0:runs)
ResultsLog <- rep(0:runs)
runs=50 #how ofthen the Simulation of GBM should be repeated
i <- 1
j <- 1
#second loop
for(j in 2:runs){
Simulation <- GBM(x, r, sigma, T, N)
x=100
r=0
sigma=1
T=1
N=20
#first loop
for(i in 2:length(BM)){
Returns <- ROC(Simulation)
LogReturns[i+1] <- log(Simulation[i+1]/Simulation[i])
i <- i+1
}
ResultsSimulation[j]<-Simulation[N]
ResultsReturns[j]<-Returns[N]
ResultsLog[j]<-LogReturns[N]
j <- j+1
}
ResultsMatrix <- as.matrix(data.frame(ResultsSimulation, ResultsReturns, ResultsLog))
The structure of this example is basically what I have. I need to construct around it 2 more loops which will do the same simulations and calculations for 3 different "r" values and "sigma" values (variables within the GBM-function). The resuls (the final value of each Simulation and calculation from the first loop) should be saved in separate vectors or in a matrix consisting of those --> thus, 27 vestors of length 50 (3 results for each combination of variables r and sigma)
for example, if sigma=0.1; 0.3; 0,6 and r=0,03; 0,05; 0,08
How to construct those loops around it all and save the data accordingly?
Sorry for the questions guys, but I'm really lost :(
Cheers and thanks a lot in advance! At least for reading ;)
Is this close to what you need? If so, you can add new trading functions to return a portfolio, then just call it (see the example):
warning: It took about 1.5 hours to run with N_SIMS = 100000!
get.simulation.GBM = function(TIME = 10, N = 250, N_SIMS = 1000, x0 = 100, sigma = c(0.1, 0.3, 0.6), r = c(0.03, 0.05, 0.08))
{
require(sde)
params = expand.grid(sigma = sigma, r = r)
# pre-allocate matrix
results = matrix(0, ncol = N_SIMS * nrow(params), nrow = N)
for (i in 1:nrow(params))
{
idx.range = ((i - 1)*N_SIMS + 1):((i - 1)*N_SIMS + N_SIMS)
temp.res = replicate(N_SIMS, GBM(x0, r = params[i, 'r'], sigma = params[i, 'sigma'], T = TIME, N = N - 1 ))
results[, idx.range] = temp.res
}
return(results)
}
apply.MACD = function(serie, nFast = 12, nSlow = 26, nSig = 9, p0 = 100)
{
require(TTR)
roc = ROC(serie)
sim.MACD = MACD(serie, nFast, nSlow, nSig, maType = "EMA")
portfolio = rep(0, length = length(serie))
portfolio[1] = p0
sim.MACD[is.na(sim.MACD)] = 0
sim.MACD = as.data.frame(sim.MACD)
for (i in 2:length(serie))
{
buy = portfolio[i - 1] * exp(roc[i])
sell = portfolio[i - 1]
hold = buy
if(sim.MACD$macd[i - 1] < sim.MACD$signal[i - 1] && sim.MACD$macd[i] > sim.MACD$signal[i]){portfolio[i] = buy}
if(sim.MACD$macd[i - 1] > sim.MACD$signal[i - 1] && sim.MACD$macd[i] < sim.MACD$signal[i]){portfolio[i] = sell}
if(sim.MACD$macd[i - 1] > sim.MACD$signal[i - 1] && sim.MACD$macd[i] > sim.MACD$signal[i]){portfolio[i] = hold}
if(sim.MACD$macd[i - 1] < sim.MACD$signal[i - 1] && sim.MACD$macd[i] < sim.MACD$signal[i]){portfolio[i] = sell}
if(sim.MACD$macd[i] == sim.MACD$signal[i]){portfolio[i] = sell}
if(sim.MACD$macd[i - 1] == sim.MACD$signal[i - 1] && sim.MACD$macd[i] != sim.MACD$signal[i]){portfolio[i] = buy}
}
return(portfolio)
}
apply.RSI = function(serie, p0 = 100, n = 14)
{
require(TTR)
roc = ROC(serie)
sim.RSI = RSI(serie, n = n)
sim.RSI[is.na(sim.RSI)] = 0
portfolio = rep(0, length = length(serie))
portfolio[1] = p0
for (i in 2:length(serie))
{
buy = portfolio[i - 1] * exp(roc[i])
hold = portfolio[i - 1]
if(sim.RSI[i - 1] < 50 && sim.RSI[i] > 50){portfolio[i] = buy}
if(sim.RSI[i - 1] > 50 && sim.RSI[i] < 50){portfolio[i] = hold}
if(sim.RSI[i - 1] < 50 && sim.RSI[i] < 50){portfolio[i] = hold}
if(sim.RSI[i - 1] > 50 && sim.RSI[i] > 50){portfolio[i] = buy}
}
return(portfolio)
}
# Example (this is SLOW)
simulation.matrix = get.simulation.GBM()
portfolio.RSI = apply(simulation.matrix, 2, apply.RSI)
portfolio.MACD = apply(simulation.matrix, 2, apply.MACD)
# if you need only the last values
portfolio.RSI.last = tail(portfolio.RSI, 1)
portfolio.MACD.last = tail(portfolio.MACD, 1)
Related
I would like to compute the Area Under the Curve defined by a set of experimental values. I created a function to calculate an aproximation of the AUC using the Simpson's rule as I saw in this post. However, the function only works when it receives a vector of odd length. How can I modify the code to add the area of the last trapezoid when the input vector has an even length.
AUC <- function(x, h=1){
# AUC function computes the Area Under the Curve of a time serie using
# the Simpson's Rule (numerical method).
# https://link.springer.com/chapter/10.1007/978-1-4612-4974-0_26
# Arguments
# x: (vector) time serie values
# h: (int) temporal resolution of the time serie. default h=1
n = length(x)-1
xValues = seq(from=1, to=n, by=2)
sum <- list()
for(i in 1:length(xValues)){
n_sub <- xValues[[i]]-1
n <- xValues[[i]]
n_add <- xValues[[i]]+1
v1 <- x[[n_sub+1]]
v2 <- x[[n+1]]
v3 <- x[[n_add+1]]
s <- (h/3)*(v1+4*v2+v3)
sum <- append(sum, s)
}
sum <- unlist(sum)
auc <- sum(sum)
return(auc)
}
Here a data example:
smoothed = c(0.3,0.317,0.379,0.452,0.519,0.573,0.61,0.629,0.628,0.613,0.587,0.556,0.521,
0.485,0.448,0.411,0.363,0.317,0.273,0.227,0.185,0.148,0.12,0.103,0.093,0.086,
0.082,0.079,0.076,0.071,0.066,0.059,0.053,0.051,0.052,0.057,0.067,0.081,0.103,
0.129,0.165,0.209,0.252,0.292,0.328,0.363,0.398,0.431,0.459,0.479,0.491,0.494,
0.488,0.475,0.457,0.43,0.397,0.357,0.316,0.285,0.254,0.227,0.206,0.189,0.181,
0.171,0.157,0.151,0.162,0.192,0.239)
One recommended way to handle an even number of points and still achieve precision is to combine Simpson's 1/3 rule with Simpson's 3/8 rule, which can handle an even number of points. Such approaches can be found in (at least one or perhaps more) engineering textbooks on numerical methods.
However, as a practical matter, you can write a code chunk to check the data length and add a single trapezoid at the end, as was suggested in the last comment of the post to which you linked. I wouldn't assume that it is necessarily as precise as combining Simpson's 1/3 and 3/8 rules, but it is probably reasonable for many applications.
I would double-check my code edits below, but this is the basic idea.
AUC <- function(x, h=1){
# AUC function computes the Area Under the Curve of a time serie using
# the Simpson's Rule (numerical method).
# https://link.springer.com/chapter/10.1007/978-1-4612-4974-0_26
# Arguments
# x: (vector) time serie values
# h: (int) temporal resolution of the time serie. default h=1
#jh edit: check for even data length
#and chop off last data point if even
nn = length(x)
if(length(x) %% 2 == 0){
xlast = x[length(x)]
x = x[-length(x)]
}
n = length(x)-1
xValues = seq(from=1, to=n, by=2)
sum <- list()
for(i in 1:length(xValues)){
n_sub <- xValues[[i]]-1
n <- xValues[[i]]
n_add <- xValues[[i]]+1
v1 <- x[[n_sub+1]]
v2 <- x[[n+1]]
v3 <- x[[n_add+1]]
s <- (h/3)*(v1+4*v2+v3)
sum <- append(sum, s)
}
sum <- unlist(sum)
auc <- sum(sum)
##jh edit: add trapezoid for last two data points to result
if(nn %% 2 == 0){
auc <- auc + (x[length(x)] + xlast)/2 * h
}
return(auc)
}
sm = smoothed[-length(smoothed)]
length(sm)
[1] 70
#even data as an example
AUC(sm)
[1] 20.17633
#original odd data
AUC(smoothed)
[1] 20.389
There may be a good reason for you to prefer using Simpson's rule, but if you're just looking for a quick and efficient estimate of AUC, the trapezoid rule is far easier to implement, and does not require an even number of breaks:
AUC <- function(x, h = 1) sum((x[-1] + x[-length(x)]) / 2 * h)
AUC(smoothed)
#> [1] 20.3945
Here, I show example code that uses the Simpson's 1/3 and 3/8 rules in tandem for the numerical integration of data. As always, the usual caveats about the possibility of coding errors or compatibility issues apply.
The output at the end compares the numerical estimates of this algorithm with the trapezoidal rule using R's "integrate" function.
#Algorithm adapted from:
#Numerical Methods for Engineers, Seventh Edition,
#By Chapra and Canale, page 623
#Modified to accept data instead of functional values
#Modified by: Jeffrey Harkness, M.S.
##Begin Simpson's rule function code
simp13 <- function(dat, h = 1){
ans = 2*h*(dat[1] + 4*dat[2] + dat[3])/6
return(ans)}
simp13m <- function(dat, h = 1){
summ <- dat[1]
n <- length(dat)
nseq <- seq(2,(n-2),2)
for(i in nseq){
summ <- summ + 4*dat[i] + 2*dat[i+1]}
summ <- summ + 4*dat[n-1] + dat[n]
result <- (h*summ)/3
return(result)}
simp38 <- function(dat, h = 1){
ans <- 3*h*(dat[1] + 3*sum(dat[2:3]) + dat[4])/8
return(ans)}
simpson = function(dat, h = 1){
hin = h
len = length(dat)
comp <- len %% 2
##number of segments
if(len == 2){
ans = sum(dat)/2*h} ##n = 2 is the trapezoidal rule
if(len == 3){
ans = simp13(dat, h = hin)}
if(len == 4){
ans = simp38(dat,h = hin)}
if(len == 6){
ans <- simp38(dat[1:4],h = hin) + simp13(dat[4:len],h = hin)}
if(len > 6 & comp == 0){
ans = simp38(dat[1:4],h = hin) + simp13m(dat[4:len],h = hin)}
if(len >= 5 & comp == 1){
ans = simp13m(dat,h = hin)}
return(ans)}
##End Simpson's rule function code
This next section of code shows the performance comparison. This code can easily be altered for different test functions and cases.
The precision difference tends to change with the sample size and test function used; this example is not intended to imply that the difference is always this pronounced.
#other algorithm for comparison purposes, from Allan Cameron above
oa <- function(x, h = 1) sum((x[-1] + x[-length(x)]) / 2 * h)
#Testing and algorithm comparison code
simans = NULL; oaans = NULL; simerr = NULL; oaerr = NULL; mp = NULL
for( j in 1:10){
n = j
#f = function(x) cos(x) + 2 ##Test functions
f = function(x) 0.2 + 25*x - 200*x^2 + 675*x^3 - 900*x^4 + 400*x^5
a = 0;b = 10
h = (b-a)/n
datain = seq(a,b,by = h)
preans = integrate(f,a,b)$value #precise numerical estimate of test function
simans[j] = simpson(f(datain), h = h)
oaans[j] = oa(f(datain), h = h)
(simerr[j] = abs(simans[j] - preans)/preans * 100)
(oaerr[j] = abs(oaans[j] - preans)/preans * 100)
mp[j] = simerr[j] < oaerr[j]
}
(outframe = data.frame("simpsons percent diff" = simerr,"trapezoidal percent diff" = oaerr, "more precise?" = mp, check.names = F))
simpsons percent diff trapezoidal percent diff more precise?
1 214.73489738 214.734897 FALSE
2 15.07958148 64.993410 TRUE
3 6.70203621 29.816799 TRUE
4 0.94247384 16.955208 TRUE
5 0.54830021 10.905620 TRUE
6 0.18616767 7.593825 TRUE
7 0.12051767 5.588209 TRUE
8 0.05890462 4.282980 TRUE
9 0.04087107 3.386525 TRUE
10 0.02412733 2.744500 TRUE
I am looking to sample repeatedly from a distribution with a specific condition.
I am sampling 50 values for four iterations and saving the results. However I need each individual results from the iteration to be smaller than the last result at the same position.
mu.c <- c(7,6,5,3) # Means of control chains
chains.sim <- function(vector, N) {
all.list <- list()
for (i in 1:length(vector)) {
Y <- MASS::rnegbin(n = N, mu = vector[i], theta = 4)
name <- paste('position:',i, sep = '')
all.list[[name]] <- Y
}
all.list
}
chains.sim(mu.c, 50)
The sampling part works fine, but the Y individual results are of course not always smaller than the results from the previous iteration ("position").
Is there a way to repeat the sampling process until the result is smaller?
I would really appreciate your help!
I would add a while loop inside your for loop which samples data sets until the condition is met.
mu.c <- c(7,6,5,3) # Means of control chains
chain.sim <- function(vector, N) {
all.list <- list()
all.list[[1]] <- MASS::rnegbin(n = N, mu = vector[1], theta = 4)
for (i in 2:length(vector)) {
is_smaller <- FALSE
while(!is_smaller){
Y <- MASS::rnegbin(n = N, mu = vector[i], theta = 4)
if (all(all.list[[i-1]] >= Y)) is_smaller <- TRUE
}
all.list[[i]] <- Y
}
all.list
}
chain.sim(mu.c, 3)
Note that I changed the condition to >=, because if 0 is generated in any round, it will never find smaller values. Also, with 50 elements this code will never stop, because it is really unlikely to get two samples where each value is smaller, let alone 4 different samples.
Edit:
it can be much faster by sampling individually as you pointed out
chain.sim <- function(vector, N) {
all.list <- list()
all.list[[1]] <- MASS::rnegbin(n = N, mu = vector[1], theta = 4)
for (i in 2:length(vector)) {
Y <- numeric(N)
for (j in 1:N){
previous_value <- all.list[[i-1]][j]
if (previous_value == 0){
Y[j] = 0
next
}
is_smaller <- FALSE
while(!is_smaller){
val <- MASS::rnegbin(1, mu = vector[i], theta = 4)
if (val <= previous_value) is_smaller <- TRUE
Y[j] <- val
}
}
all.list[[i]] <- Y
}
all.list
}
chain.sim(mu.c, 50)
If 0 is encountered anywhere, no more simulation is necessary as we know the next value can only be 0. This makes the simulation much faster
I would like to apply the Rejection sampling method to simulate a random vector Y=(Y_1, Y_2) of a uniform distribution from a unit disc D = { (X_1 , X_2) \in R^2: \sqrt{x^2_1 + x^2_2} ≤ 1} such that X = (X_1 , X_ 2) is random vector of a uniform distribution in the square S = [−1, 1]^2 and the joint density f(y_1,y_2) = \frac{1}{\pi} 1_{D(y_1,y_2)}.
In the rejection method, we accept a sample generally if f(x) \leq C * g(x). I am using the following code to :
x=runif(100,-1,1)
y=runif(100,-1,1)
d=data.frame(x=x,y=y)
disc_sample=d[(d$x^2+d$y^2)<1,]
plot(disc_sample)
I have two questions:
{Using the above code, logically, the size of d should be greater than the size of disc_sample but when I call both of them I see there are 100 elements in each one of them. How could this be possible. Why the sizes are the same.} THIS PART IS SOLVED, thanks to the comment below.
The question now
Also, how could I reformulate my code to give me the total number of samples needed to get 100 samples follow the condition. i.e to give me the number of samples rejected until I got the 100 needed sample?
Thanks to the answer of r2evans but I am looking to write something simpler, a while loop to store all possible samples inside a matrix or a data frame instead of a list then to call from that data frame just the samples follow the condition. I modified the code from the answer without the use of the lists and without sapply function but it is not giving the needed result, it yields only one row.
i=0
samps <- data.frame()
goods <- data.frame()
nr <- 0L
sampsize <- 100L
needs <- 100L
while (i < needs) {
samps <- data.frame(x = runif(1, -1, 1), y = runif(1, -1, 1))
goods <- samps[(samps$x^2+samps$y^2)<1, ]
i = i+1
}
and I also thought about this:
i=0
j=0
samps <- matrix()
goods <- matrix()
needs <- 100
while (j < needs) {
samps[i,1] <- runif(1, -1, 1)
samps[i,2] <- runif(1, -1, 1)
if (( (samps[i,1])**2+(samps[i,2])**2)<1){
goods[j,1] <- samps[i,1]
goods[j,2] <- samps[i,2]
}
else{
i = i+1
}
}
but it is not working.
I would be very grateful for any help to modify the code.
As to your second question ... you cannot reformulate your code to know precisely how many it will take to get (at least) 100 resulting combinations. You can use a while loop and concatenate results until you have at least 100 such rows, and then truncate those over 100. Because using entropy piecewise (at scale) is "expensive", you might prefer to always over-estimate the rows you need and grab all at once.
(Edited to reduce "complexity" based on homework constraints.)
set.seed(42)
samps <- vector(mode = "list")
goods <- vector(mode = "list")
nr <- 0L
iter <- 0L
sampsize <- 100L
needs <- 100L
while (nr < needs && iter < 50) {
iter <- iter + 1L
samps[[iter]] <- data.frame(x = runif(sampsize, -1, 1), y = runif(sampsize, -1, 1))
rows <- (samps[[iter]]$x^2 + samps[[iter]]$y^2) < 1
goods[[iter]] <- samps[[iter]][rows, ]
nr <- nr + sum(rows)
}
iter # number of times we looped
# [1] 2
out <- head(do.call(rbind, goods), n = 100)
NROW(out)
# [1] 100
head(out) ; tail(out)
# x y
# 1 0.8296121 0.2524907
# 3 -0.4277209 -0.5668654
# 4 0.6608953 -0.2221099
# 5 0.2834910 0.8849114
# 6 0.0381919 0.9252160
# 7 0.4731766 0.4797106
# x y
# 221 -0.65673577 -0.2124462
# 231 0.08606199 -0.7161822
# 251 -0.37263236 0.1296444
# 271 -0.38589120 -0.2831997
# 28 -0.62909284 0.6840144
# 301 -0.50865171 0.5014720
I am taking baby steps to use metaheuristics for solving constrained optimization problems. I am trying to solve basic Markowitz Mean-Variance optimization model (given below) using NMOFpackage in R.
Min
lambda * [sum{i=1 to N}sum{j = 1 to N}w_i*w_i*Sigma_ij] - (1-lambda) * [sum{i=1 to N}(w_i*mu_i)]
subject to
sum{i=1 to N}{w_i} = 1
0 <= w_i <= 1; i = 1,...,N
where, lambda takes values between 0 and 1, N is number of assets.
Following is my code (Based on Book: Numerical Methods and Optimization in Finance):
library(NMOF)
na <- dim(fundData)[2L]
ns <- dim(fundData)[1L]
Sigma <- cov(fundData)
winf <- 0.0
wsup <- 1.0
m <- colMeans(fundData)
resample <- function(x,...) x[sample.int(length(x),...)]
data <- list(R = t(fundData),
m = m,
na = dim(fundData)[2L],
ns = dim(fundData)[1L],
Sigma = Sigma,
eps = 0.5/100,
winf = winf,
wsup = wsup,
nFP = 100)
w0 <- runif(data$na); w0 <- w0/sum(w0)
OF <- function(w,data){
wmu <- crossprod(w,m)
res <- crossprod(w, data$Sigma)
res <- tcrossprod(w,res)
result <- res - wmu
}
neighbour <- function(w, data){
toSell <- w > data$winf
toBuy <- w < data$wsup
i <- resample(which(toSell), size = 1L)
j <- resample(which(toBuy), size = 1L)
eps <- runif(1) * data$eps
eps <- min(w[i] - data$winf, data$wsup - w[j], eps)
w[i] <- w[i] - eps
w[j] <- w[j] + eps
w
}
algo <- list(x0 = w0, neighbour = neighbour, nS = 5000L)
system.time(sol1 <- LSopt(OF, algo, data))
I am not sure how to include lambda in the objective function (OF). The above code does not include lambda in OF. I tried using for loop but it resulted in following error:
OF <- function(w,data){
lambdaSeq <- seq(.001,0.999, length = data$nFP)
for(lambda in lambdaSeq){
wmu <- crossprod(w,m)
res <- crossprod(w, data$Sigma)
res <- tcrossprod(w,res)
result <- lambda*res - (1-lambda)*wmu
}
}
Error:
Local Search.
Initial solution:
| | 0%
Error in if (xnF <= xcF) { : argument is of length zero
Timing stopped at: 0.01 0 0.03
It would be nice if someone could help me in this regard.
P.S: I am also aware that this can be solved using quadratic programming. This is just an initiation to include other constraints.
If I understand correctly, you want to replicate the mean--variance efficient frontier by Local Search? Then you need to run a Local Search for every value of lambda that you want to include in the frontier.
The following example should help you get going. I start by attaching the package and setting up the list data.
require("NMOF")
data <- list(m = colMeans(fundData), ## expected returns
Sigma = cov(fundData), ## expected var of returns
na = dim(fundData)[2L], ## number of assets
eps = 0.2/100, ## stepsize for LS
winf = 0, ## minimum weight
wsup = 1, ## maximum weight
lambda = 1)
Next I compute a benchmark for the minimum-variance case (i.e. lambda equals one).
## benchmark: the QP solution
## ==> this will only work with a recent version of NMOF,
## which you can get by saying:
## install.packages('NMOF', type = 'source',
## repos = c('http://enricoschumann.net/R',
## getOption('repos')))
##
require("quadprog")
sol <- NMOF:::minvar(data$Sigma, 0, 1)
Objective function and neighbourhood function. I have slightly simplified both functions (for clarity; using crossprod in the objective function would probably be more efficient).
OF <- function(w, data){
data$lambda * (w %*% data$Sigma %*% w) -
(1 - data$lambda) * sum(w * data$m)
}
neighbour <- function(w, data){
toSell <- which(w > data$winf)
toBuy <- which(w < data$wsup)
i <- toSell[sample.int(length(toSell), size = 1L)]
j <- toBuy[sample.int(length(toBuy), size = 1L)]
eps <- runif(1) * data$eps
eps <- min(w[i] - data$winf, data$wsup - w[j], eps)
w[i] <- w[i] - eps
w[j] <- w[j] + eps
w
}
Now we can run Local Search. Since it is a fairly large dataset (200 assets),
you will need a relatively large number of steps to reproduce the QP solution.
w0 <- runif(data$na) ## a random initial solution
w0 <- w0/sum(w0)
algo <- list(x0 = w0, neighbour = neighbour, nS = 50000L)
sol1 <- LSopt(OF, algo, data)
You can compare the weights you get from Local Search with the QP solution.
par(mfrow = c(3,1), mar = c(2,4,1,1), las = 1)
barplot(sol, main = "QP solution")
barplot(sol1$xbest, main = "LS solution")
barplot(sol - sol1$xbest,
ylim = c(-0.001,0.001)) ## +/-0.1%
Finally, if you want to compute the whole frontier, you need to rerun this code for different levels of data$lambda.
I wrote the following code and got the error: number of items to replace is not a multiple of replacement length at code line:
X_after[count, ] = c(censN1, censN2, censN3)
After searching the internet, I found the problem is probably caused by the unmatched number of samples size of the pre-determine n_samples of NA and the final X_after dataset. How can I adjust the matrix code such that ncol is dynamically determined after the loop rather than pre-determined at n_samples? Or if you have other solutions to this error message, please chime in as well.
multiLodSim <- function (GM, GSD, n_samples, n_iterations, p) {
X_after <- matrix(NA_real_, nrow = n_iterations, ncol = n_samples)
delta <- matrix(NA_real_, nrow = n_iterations, ncol = n_samples)
mu <- log(GM)
sigma <- log(GSD)
lod1 <- quantile(rlnorm(100000,mu,sigma),p)
lod2 <- quantile(rlnorm(100000,mu,sigma),(p*0.95))
lod3 <- quantile(rlnorm(100000,mu,sigma),(p*0.9))
pct_cens <- numeric(n_iterations)
count <- 1
while(count <= n_iterations) {
sub_samples = n_samples/3 # divide the total sample into third (for 3 lods)
n1 <- rlnorm(sub_samples,mu,sigma)
censN1 <- sort(pmax(n1,lod1))
n2 <- rlnorm(sub_samples,mu,sigma)
censN2 <- sort(pmax(n2,lod1))
censN2[censN2==lod1] <- lod2
n3 <- rlnorm(sub_samples,mu,sigma)
censN3 <- sort(pmax(n3,lod1))
censN3 [censN3==lod1] <- lod3
X_after[count, ] = c(censN1, censN2, censN3)
delta [count, ] = X_after <= lod1 # nondetects= TRUE (1), detects= FALSE (0)
pct_cens [count] = mean(delta[count,]) #
if (pct_cens [count] > 0 & pct_cens [count] < 1 ) count <- count + 1}}
a = multiLodSim(GM=1,GSD=2,n_samples=20,n_iterations=5,p=0.3)
Updates: After reading your comments, I made changes to these code lines and it is working. Thank you for your help.
n1 = rlnorm(round(sub_samples),mu,sigma)
n2 = rlnorm(round(sub_samples),mu,sigma)
sub_samples3 = n_samples - length(n1)-length(n2)
n3 = rlnorm(subsamples3, mu,sigma)
Your problem lies in the fact that
sub_samples = n_samples/3 is not a whole number.
When you create a sample of fractional size it creates a sample of floor(size)
length(rlnorm(1.5,1,1))
## [1] 1
Thus, when you recombine your data
length( c(censN1, censN2, censN3)) does not (necessarily) equal n_sample.
Thus, you need a method for dealing with numbers of samples that are not divisible by 3.