Moment Matching Scenario Generation in R - r

I am working on a portfolio optimazion algorithm and part of the problem consists in generating moment matching scenario.
My choice due to its simplicity and quickness was to go through paper "An algorithm for moment-matching scenario generation with application to financial portfolio optimization" (Ponomareva, Roman and Date).
The problem is that even though the mathematics are very simple, I am stuck by the fact that some of probability weights pi are negative even though the formulas in the paper should ensure otherwise. If I put a loop to run the algorithm until it finds a positive combination it essentially runs forever.
I put the bit of code based on the paper were things get stuck:
dummy1 = 0
while (dummy1 <=0 | dummy1 >= 1) {
dummy1 = round(rnorm(1, mean = 0.5, sd = 0.25), 2)
}
diag.cov.returns = diag(cov.returns)
Z = dummy1 * sqrt (diag.cov.returns) #Vector Z according to paper formula
ZZT = Z %*% t(Z)
LLT = cov.returns - ZZT
L = chol(LLT) #cholesky decomposition to get matrix L
s = sample (1:5, 1)
F1 = 0
F2 = -1
S = (2*N*s)+3
while (((4*F2)-(3*F1*F1)) < 0) {
#Gamma = (2*s*s)*(((N*mean.fourth) - (0.75*(sum(Z^4)* (N*mean.third/sum(Z^3))^2)))/sum(L^4))
#Gamma is necessary if we want to get p from Uniform Distribution
#U = runif(s, 0, 1)
U = rgamma(s, shape = 1, scale = ((1/exp(1)):1))
#p = (s*(N/Gamma)) + ((1/(2*N*s)) - (s/(N*Gamma)))*U
p = (-log(U, base = exp(1)))
p = p/(((2*sum(p))+max(p))*N*s) #this is the array expected to have positive and bounded between 0 and 1
q1 = 1/p
pz = p
p[s+1] = (1-(2*N*sum(p))) #extra point necessary to get the 3 moment mathcing probabilities
F1 = (N*mean.third*sqrt(p[s+1]))/(sum(Z^3))
F2 = p[s+1]*(((N*mean.fourth) - (1/(2*s*s))*sum(L^4)*(sum(1/p)))/sum(Z^4))
}
alpha = (0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
beta = -(0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
w1 = 1/(alpha*(alpha+beta))
w2 = 1/(beta*(alpha+beta))
w0 = 1 - (1/(alpha*beta))
P = rep(pz, 2*N) #Vector with Probabilities starting from p + 3 extra probabilities to match third and fourth moments
P[(2*N*s)+1] = p[s+1]*w0
P[(2*N*s)+2] = p[s+1]*w1
P[(2*N*s)+3] = p[s+1]*w2
Unfortunately I cannot discolose the input dataset containing funds returns. However I can surely be more specific. Starting from a data.frame() containing N assets' returns (in my case there 11 funds and monthly returns from 30/01/2001 to 30/09/2020). Once the mean returns, covariance matrix, central third and fourth moments (NOT skewness and kurtosis) and the averages are computed. The algorithm follows as I have reported in the problem. The point where i get stuck is that p takes also negative values. This is a problem since the first s elements of p are later used as probabilities in P.
I hope that in this way the problem is more clear. I also want to add that in the paper the data used by the authors is reported, unfortunately to import them in R would be necessary to import them manually. However I repeat any data.frame() containing assets' returns will do.

Related

Is there a way to test a range of exponents in a lm() model in the same way as the code below more efficiently?

The basic gist is that I have a set of housing data that I need to create a model for to minimize the predicted price vs actual price of house based on the dataset. So I created this bit of code to essentially test for a range of different numerators and find the one that minimized the difference between them. I'm using the median instead of the mean as the data isn't exactly normal.
Since I only have experience with lm(), I'm using that to create the coefficients and C values. But since the model likes exponents, I have to also test various exponents. It does this for each of the variables and then goes back to the first and re-evaluates it based on the other exponents. The model starts out with all the exponents ending up equal to 1. So the same as the basic linear model. I know that this is probably horribly inefficient and probably uses a lot of code in a somewhat wasteful, but I'm in my first r class so sorry about the mess and/or convoluted coding logic.
Is there any way to do this same thing but being more efficient. Also, I can't really decrease the number of variables as the model likes having more variables and produces a greater margin of error when they aren't present.
w <- seq(1,10000,1)
r <- seq(1,10000,1)
t <- seq(1,10000,1)
z <- seq(1,10000,1)
s <- seq(1,10000,1)
coef_1 <- c(6000,6000,6000,6000,6000,6000,6000,6000)
v <- rep(6000, each = 8)
for(l_1 in 1:10){
for(t_1 in 1:8){
for(i in 1:10000){
t = t_1
coef_1[t] = i
mod5 <- lm(log(SALE_PRC) ~ I(TOT_LVG_AREA^((coef_1[1]-5000)/1000)) + I(LND_SQFOOT^((coef_1[2]-5000)/1000)) + I(RAIL_DIST^((coef_1[3]-5000)/1000)) + I(OCEAN_DIST^((coef_1[4]-5000)/1000)) + I(CNTR_DIST^((coef_1[5]-5000)/1000)) + I(HWY_DIST^((coef_1[6]-5000)/1000)) + I(structure_quality^((coef_1[7]-5000)/1000)) + SUBCNTR_DI + SPEC_FEAT_VAL + (exp(((coef_1[8]-5000)/1000)*SPECIAL_RATIO)) + age, data = kaggle_transform_final)
kaggle_new <- kaggle_transform_final %>%
add_predictions(model = mod5, var = "prediction") %>%
mutate(new_predict = exp(prediction)) %>%
mutate(new_difference = abs((new_predict-SALE_PRC))/SALE_PRC) %>%
mutate(average_percent_difference = median(new_difference)) %>%
mutate(mean_percent_difference = mean(new_difference)) %>%
mutate(quart_75 = quantile(new_difference,.75))
w[i] = kaggle_new$average_percent_difference[1]
r[i] = kaggle_new$mean_percent_difference[1]
t[i] = kaggle_new$quart_75[1]
z[i] = i
s[i] = (i-5000)/1000
if(i%%100 ==0){show(i)}
}
u <- data.frame(median_diff = w, mean_diff = r, quart_75 = t, actual = s, number = z) %>%
arrange(median_diff)
coef_1[t_1] <- u$number[1]
v[t_1] <- u$actual[1]
show(coef_1)
}
coef_1 <- coef_1
}

CTL - Central limit theorem, how to use it in R?

I've learned the CTL, and I have a question.
There is an average of 100 observations when the expectation is M and the Variance is 9.
I need to find the a&b Top block and bottom block, that the probability will be bigger than 0.9.
p( a <= x(100)-M <= b ) >= 0.9
the X(100) is x with 100 tries.
How I do this in R? I can't understand.. I wrote something like this -
numt <- 1:100
cbind(numt , 1-2*pnorm(-b/3))
but I understand it doesn't work well.
I am not sure I understand if the following is what the question asks for.
The following function takes as input
M - expectation,
Var - variance,
n - the sample size,
prob the two-sided probability.
Arguments Var, n and prob have the defaults in the question, so this can be seen as a function of M only.
fun <- function(M, Var = 9, n = 100, prob = 0.90){
p <- c((1 - prob)/2, 1 - (1 - prob)/2)
qq <- qnorm(p, mean = M, sd = sqrt(Var/n))
setNames(qq, c("lower", "upper"))
}
M <- 1
fun(M)
# lower upper
#0.5065439 1.4934561

Error in check_for_unknown_vars_impl(model, the_ast) : The expression contains a variable that is not part of the model

the question was to find the minimum distance-quantity value, however, the code keep showing up the following error
Error in check_for_unknown_vars_impl(model, the_ast) : The expression contains a variable that is not part of the model.
I really cannot find what's the problem. The code is showed as following:
#load the data of distance from factory to distribution center
distance_factorydc <- read_excel("AutoParts24.xlsx",sheet="Factories_to_DCs")
distance_factorydc<-distance_factorydc[,-1]
#load the data of distance from distribution center to customers
distance_dccustomer<- read_excel("AutoParts24.xlsx",sheet="DCs_to_Customers")
distance_dccustomer<-distance_dccustomer[-c(5:6),-1]
#the demand of the customer
customerdemand<- read_excel("AutoParts24.xlsx",sheet="DCs_to_Customers")
customerdemand<-customerdemand[6,-1]
# the ILP model is created
model <- MIPModel() %>%
# set F as a continuous variables
# Fij is the amount that shipped from factory i to distribution center j
add_variable(F[i, j], i = 1:3, j = 1:4, type = "continuous", lb = 0) %>%
# set C as a continuous variables
# Cjk is the amount that shipped from distribution center j to customer k
add_variable(C[j, k], j =1:4 , k = 1:30, type = "continuous", lb = 0) %>%
# minimize the total cost
set_objective(sum_expr(distance_factorydc[i,j]* F[i, j], i = 1:3, j = 1:4) +
sum_expr(distance_dccustomer[j,k]* C[j, k], j = 1:4, k = 1:30) ,"min")%>%
# the total amount that shipped from distribution center to customers
# should >= the total demand of the customer
add_constraint(sum_expr(F[j, k],j=1:4) >= sum_expr(customerdemand[k],k=1:30)) %>%
# the total amount that shipped from factory to distribution center should be the same as
# the amount that shipped from distribution center to customers
add_constraint(sum_expr(F[i, j], i = 1:3) >= sum_expr(C[j, k] , k = 1:30), j=1:4)
I am unsure about how the data you are importing is formulated but distance_factorydc, distance_dccustomer, and customerdemand variables seem to be the problems in the model. Make sure the indexes in the model are referring to correct rows and columns. You may also convert them and formulate them in the model as vectors.
add_constraint(sum_expr(F[j, k],j=1:4) >= sum_expr(customerdemand[k],k=1:30))
looks suspicious to me:
F[j,k] should be something like F[i,j]
The k in F[j,k] is not under control as it is inside sum_expr(customerdemand[k],k=1:30)
This code needs a bit more care.

Solving double (n-tuple) multivariate integrals in R

I would like to write a code to solve this kind of equations:
For that I wrote the code below, however it does not solve the problem. Do you have any ideas about the possibility to solve this kind of integrals in R?
t_0 = 15
mu = 0.1
lambda = 0.8
f = function(x1,x2) exp(mu*(x1+x2))*dexp(log(lambda)*(x1+x2))
f_comp = function(x2) f(x1,x2)
f_1 = function(x1) {integrate(f_comp,upper = t_0, lower = x1)}
result = integrate(f = f_1, lower = 0, upper = t_0)$value
--------- edit:
Given the answer below, I adapt the code to my example, but I still think is not the correct one, at least the value 0 for the integral does not make sense.
integrate(function(x1) {
sapply(x1, function(x1){
integrate(function(x2) exp(mu*(x1+x2))*dexp(log(lambda)*(x1+x2)), lower = x1, upper = t_0)$value
})
}, 0, t_0)
by the way, I would like to buid a general procedure for that (that is why I just not calculate the integral by hand). That is not only double integrals, but also n-tuples integrals, so I need a general procedure for this kind of calculations.
Make a picture of the domain of integration. This is a simplex (a triangle) with vertices (0,0), (0,t0), (t0,t0). To evaluate an integral on a simplex, the SimplicialCubature package is the way to go.
t0 = 15
mu = 0.1
lambda = 0.8
library(SimplicialCubature)
f <- function(xy){
x <- xy[1]; y <- xy[2]
exp(-mu*(x+y)) * (1-exp(-lambda*(x+y)))
}
S <- cbind(c(0,0), c(0,t0), c(t0,t0))
adaptIntegrateSimplex(f, S)$integral
# 29.55906
integrate(function(x1) {
sapply(x1, function(x1){
integrate(function(x2) exp(-mu*(x1+x2))*(1-exp(-lambda*(x1+x2))), lower = x1,
upper = t0)$value
})
}, 0, t0)$value
# 29.55906

Portfolio optimization

I am trying to build a portfolio which is optimized with respect to another in R.
I am trying to minimize the objective function
$$min Var(return_p-return'weight_{bm})$$
with the constraints
$$ 1_n'w = 1$$
$$w > .005$$
$$w < .8$$
with w being the returns from a portfolio. there are 10 securities, so I set the benchmark weights at .1 each.
I know that
$$ Var(return_p-return'weight_{bm})= var(r) + var(r'w_{bm}) - 2*cov(r_p, r'w_{bm})=var(r'w)-2cov(r'w,r'w_{bm})=w'var(r)w-2cov(r'w,r'w_{bm})$$
$$=w'var(r)w-2cov(r',r'w_bm)w$$
the last term is of the form I need so I tried to solve this with solve.QP in R, the constraints are giving me a problem though.
here is my code
trackport <- array(rnorm(obs * assets, mean = .2, sd = .15), dim = c(obs,
assets)) #this is the portfolio which the assets are tracked against
wbm <- matrix(rep(1/assets, assets)) #random numbers for the weights
Aeq <- t(matrix(rep(1,assets), nrow=assets, ncol = 1)) #col of 1's to add
#the weights
Beq <- 1 # weights should sum to 1's
H = 2*cov(trackport) #times 2 because of the syntax
#multiplies the returns times coefficients to create a vector of returns for
#the benchmark
rbm = trackport %*% wbm
#covariance between the tracking portfolio and benchmark returns
eff <- cov(trackport, rbm)
#constraints
Amatrix <- t(matrix(c(Aeq, diag(assets), -diag(assets)), ncol = assets,
byrow = T))
Bvector <- matrix(c(1,rep(.005, assets), rep(.8, assets)))
#solve
solQP3 <- solve.QP(Dmat = H,
dvec = zeros, #reduces to min var portfolio for
#troubleshooting purposes
Amat = Amatrix,
bvec = Bvector,
meq = 1)
the error I am getting is "constraints are inconsistent, no solution!" but I can't find what's wrong with my A matrix
My (transposed) A matrix looks like this
[1,1,...,1]
[1,0,...,0]
[0,1,...,0]
...
[0,0,...,1]
[-1,0,...,0]
[0,-1,...,0]
...
[0,0,...,-1]
and my $b_0$ looks like this
[1]
[.005]
[.005]
...
[.005]
[.8]
[.8]
...
[.8]
so I'm not sure why it isn't finding a solution, could anyone take a look?
I'm not familiar with the package, but just took a quick look at https://cran.r-project.org/web/packages/quadprog/quadprog.pdf , which apparently is what you are using.
Your RHS values of .8 should be -0.8 because this function uses ≥ inequalities. So you have been constraining the variables to be ≥ .005 and ≤ -0.8, which of course is not what you want, and is infeasible.
So leave transposed A as is and make
b0:
[1]
[.005]
[.005]
...
[.005]
[-.8]
[-.8]
...
[-.8]

Resources