RiskPortfolio Package Optimal Portfolio results - r

I'm trying to use the RiskPortfolios package to find the optimal portfolio weights for a couple of different optimizations, with long only and weights sum to 100% constraints.
Using the sample data provided with the package as an example (given below), I am getting portfolio weights that are all ~0% for all securities.
library("RiskPortfolios")
data("Industry_10")
rets = Industry_10
covEstimation(rets)
Sigma = covEstimation(rets)
optimalPortfolio(Sigma = Sigma, control = list(type = 'maxdiv', constraint = 'lo'))
https://www.rdocumentation.org/packages/RiskPortfolios/versions/2.1.7/topics/optimalPortfolio
results:
[1] 1.596802e-03 4.426586e-02 5.115952e-21 1.829356e-01 9.853242e-18
[6] 1.092012e-01 1.528876e-01 1.821066e-01 3.270063e-01 4.557049e-21
Does anyone have any experience with this package or info on where i'm going wrong?
Thanks

The weights will sum to 1 (i.e. 100%). For instance, 4th weight is 1.82e-01, i.e. 18.2%.

Related

lpSolve - find maximum value - formula with multiplications

I'm looking for a way to use lpSolve in a similar way to how I use it succesfully in Excel. I've calculated elasticity for various product. Based on whether a product is elasticity or not elastic I want to give an advice on which price to ask.
I have the following values:
current.price = 15
sales.lastmonth = 50
elasticity = -1.5
And I want to optimize the sales.prediction by changing the suggested.price
Formula:
sales.prediction = (sales.lastmonth - ((abs(elasticity)(suggested.price-current.price))(sales.lastmonth/current.price)))*suggested.price
I've tried the following:
# install.packages("lpSolve")
library(lpSolve)
objective.fn <- c(sales.prediction) # determine what the objective is
# constrain sales.lastmonth and current.price
const.mat <- matrix(c(1,1),ncol=1,byrow=T)
const.dir <- c("=","=")
const.rhs <- c(sales.lastmonth, current.price)
lp("max",objective.fn,const.mat,const.dir,const.rhs,compute.sens=TRUE)
Any suggestions?
-----------
EDIT - V2: Based on the comment below, I would like to add the constraint that the suggested.price should be at max 25% higher of 25% lower than the the suggested price I have in my mind without the optimization. For example, let's say I am already thinking about lowering the price to 12.5. Is that possible as well?
If I understand correctly you are simply trying to optimize (maximize) this function, there are no constraints since your variables are constant. If this is true then you can do
optimize(
f=function(x){
sales.lastmonth-((abs(elasticity)*(x-current.price))*(sales.lastmonth/current.price))*x
},
interval=c(-100,100),
maximum=T
)
$maximum
[1] 7.5
$objective
[1] 331.25
maximum at 7.5.
Edit: a simple hack to limit the optimization is to use the interval argument, for your edit interval=c(current.price*0.75,current.price*1.25).

R - Portfolio Analytics optimization not working

im currently working on creating a minimum Variance portfolio and decided to use the
function optimize.portfolio of the PortfolioAnalytics package.
Unfortunately, when extracting the weights, all of them are NA, eventhough non of my returns do have any NA value which would be the only reason (from my piont of view) to cause resulting weights to be NA.
My dataset consists of a multiple assets (+5000) each with 60 observations (monthly).
library(PortfolioAnalytics)
library(ROI)
#index returns is an xts object consisting of 3800 stock Ids(columns) and 60 observations in
# monthly interval: To exemplifiy my problem, I set all values in index_returns to 1, to make
# sure that no NA values exist.
index_returns
any(is.na(index_returns)) # --> evaluates to FALSE
port_spec <- portfolio.spec(assets =colnames(index_returns) )
# Add a full investment constraint such that the weights sum to 1
port_spec <- add.constraint(portfolio = port_spec, type = "full_investment")
# Add a long only constraint such that the weight of an asset is between 0 and 1
port_spec <- add.constraint(portfolio = port_spec, type = "long_only")
# Add an objective to min portfolio variance
port_spec <- add.objective(portfolio = port_spec, type = "risk", name = "var")
# Solve the optimization problem
opt <- optimize.portfolio(R = index_returns, trace=TRUE, portfolio = port_spec,optimize_method = "ROI")
extractWeights(opt) #evaluates to NA for all assets
Does anyone know why this occurs and has any suggestion how to deal with this issue. I know that this optimsiation problem very likely faces invertibility issues due to far more columns than rows, but apart from this notion Im struggling to make any progress with my problem.
I highly appreciate any help!! Thanks in advance
Your optimization most likely fails because you have way more assets than observations. Then, as you correctly assumed, you can't obtain an inverse of the estimated covariance matrix.
To quote from "A Portfolio Optimization Approach with aLarge Number of Assets: Applications tothe US and Korean Stock Markets" available at: https://onlinelibrary.wiley.com/doi/epdf/10.1111/ajfs.12233
"Many attempts have been made to find an invertible estimator of the covariancematrix when N is larger than T. The pseudoinverse estimators of the covariancematrix are used by Sengupta (1983) and Pappas et al. (2010), and the shrinkageestimators of the covariance matrix are suggested by Ledoit and Wolf (2003). Ledoitand Wolf (2003) propose estimating the co variance matrix by an optimallyweighted average of two existing estimators: the sample covariance matrix and sin-gle-index covariance matrix."
So I'd suggest you take a look at the Ledoit-Wolf shrinkage method as a first step. The R-package RiskPortfolios and might also be useful, see https://joss.theoj.org/papers/10.21105/joss.00171

What is the formula to calculate the gini with sample weight

I need your helps to explain how I can obtain the same result as this function does:
gini(x, weights=rep(1,length=length(x)))
http://cran.r-project.org/web/packages/reldist/reldist.pdf --> page 2. Gini
Let's say, we need to measure the inocme of the population N. To do that, we can divide the population N into K subgroups. And in each subgroup kth, we will take nk individual and ask for their income. As the result, we will get the "individual's income" and each individual will have particular "sample weight" to represent for their contribution to the population N. Here is example that I simply get from previous link and the dataset is from NLS
rm(list=ls())
cat("\014")
library(reldist)
data(nls);data
help(nls)
# Convert the wage growth from (log. dollar) to (dollar)
y <- exp(recent$chpermwage);y
# Compute the unweighted estimate
gini_y <- gini(y)
# Compute the weighted estimate
gini_yw <- gini(y,w=recent$wgt)
> --- Here is the result----
> gini_y = 0.3418394
> gini_yw = 0.3483615
I know how to compute the Gini without WEIGHTS by my own code. Therefore, I would like to keep the command gini(y) in my code, without any doubts. The only thing I concerned is that the way gini(y,w) operate to obtain the result 0.3483615. I tried to do another calculation as follow to see whether I can come up with the same result as gini_yw. Here is another code that I based on CDF, Section 9.5, from this book: ‘‘Relative
Distribution Methods in the Social Sciences’’ by Mark S. Handcock,
#-------------------------
# test how gini computes with the sample weights
z <- exp(recent$chpermwage) * recent$wgt
gini_z <- gini(z)
# Result gini_z = 0.3924161
As you see, my calculation gini_z is different from command gini(y, weights). If someone of you know how to build correct computation to obtain exactly
gini_yw = 0.3483615, please give me your advices.
Thanks a lot friends.
function (x, weights = rep(1, length = length(x)))
{
ox <- order(x)
x <- x[ox]
weights <- weights[ox]/sum(weights)
p <- cumsum(weights)
nu <- cumsum(weights * x)
n <- length(nu)
nu <- nu/nu[n]
sum(nu[-1] * p[-n]) - sum(nu[-n] * p[-1])
}
This is the source code for the function gini which can be seen by entering gini into the console. No parentheses or anything else.
EDIT:
This can be done for any function or object really.
This is bit late, but one may be interested in concentration/diversity measures contained in the [SciencesPo][1] package.

how to optimize parameters of of the support vector machine by using the genetic algorithm with R

In order to learn the support vector machine, we must determine various parameters.
For example, there are parameters such as cost and gamma.
I am trying to determine sigma and gamma parameters of SVM Using "GA" package and "kernlab" package of R.
I use accuracy as the evaluation function of the genetic algorithm.
I have created the following code, and I ran it.
library(GA)
library(kernlab)
data(spam)
index <- sample(1:dim(spam)[1])
spamtrain <- spam[index[1:floor(dim(spam)[1]/2)], ]
spamtest <- spam[index[((ceiling(dim(spam)[1]/2)) + 1):dim(spam)[1]], ]
f <- function(x)
{
x1 <- x[1]
x2 <- x[2]
filter <- ksvm(type~.,data=spamtrain,kernel="rbfdot",kpar=list(sigma=x1),C=x2,cross=3)
mailtype <- predict(filter,spamtest[,-58])
t <- table(mailtype,spamtest[,58])
return(t[1,1]+t[2,2])/(t[1,1]+t[1,2]+t[2,1]+t[2,2])
}
GA <- ga(type = "real-valued", fitness = f, min = c(-5.12, -5.12), max = c(5.12, 5.12), popSize = 50, maxiter = 2)
summary(GA)
plot(GA)
However, When I call the GA function,the following error is returned.
"No Support Vectors found. You may want to change your parameters"
I can not understand why the code is bad.
Using GA for SVM parameters is not a good idea - it should be sufficient to just do a regular grid search ( two for loops, one for C and one for gamma values).
In Rs library e1071 (which also provides SVMs) there is a methodtune.svm` which looks for best parameters using a grid search.
Example
data(iris)
obj <- tune.svm(Species~., data = iris, sampling = "fix",
gamma = 2^c(-8,-4,0,4), cost = 2^c(-8,-4,-2,0))
plot(obj, transform.x = log2, transform.y = log2)
plot(obj, type = "perspective", theta = 120, phi = 45)
Which also shows one important thing - you should look for a good C and gamma values in a geometric manner, so eg. 2^x for x in {-10,-8,-6,-6,-4,-2,0,2,4}.
GA is an algorithm for meta optimisation, where the parameters space is huge, and there is no easy correlation between parameters and the optimising function. It requires tuning of much more parameters then SVM (number of generations, size of the population, mutation probability, crossing probability, mutation operator, crossing operator ...) so it completely useless approach here.
And of course - as it was earlier stated in comments - C and Gamma have to be strictly positive.
For more details about using e1071 take a look at the CRAN document: http://cran.r-project.org/web/packages/e1071/e1071.pdf

R: Robust fitting of data points to a Gaussian function

I need to do some robust data-fitting operation.
I have bunch of (x,y) data, that I want to fit to a Gaussian (aka normal) function.
The point is, I want to remove the ouliers. As one can see on the sample plot below, there is another distribution of data thats pollutting my data on the right, and I don't want to take it into account to do the fitting (i.e. to find \sigma, \mu and the overall scale parameter).
R seems to be the right tool for the job, I found some packages (robust, robustbase, MASS for example) that are related to robust fitting.
However, they assume the user already has a strong knowledge of R, which is not my case, and the documentation is only provided as a sort of reference manual, no tutorial or equivalent. My statistical background is rather low, I attempted to read reference material on fitting with R, but it didn't really help (and I'm not even sure thats the right way to go).
But I have the feeling that this is actually a quite simple operation.
I have checked this related question (and the linked ones), however they take as input a single vector of values, and I have a vector of pairs, so I don't see how to transpose.
Any help on how to do this would be appreciated.
Fitting a Gaussian curve to the data, the principle is to minimise the sum of squares difference between the fitted curve and the data, so we define f our objective function and run optim on it:
fitG =
function(x,y,mu,sig,scale){
f = function(p){
d = p[3]*dnorm(x,mean=p[1],sd=p[2])
sum((d-y)^2)
}
optim(c(mu,sig,scale),f)
}
Now, extend this to two Gaussians:
fit2G <- function(x,y,mu1,sig1,scale1,mu2,sig2,scale2,...){
f = function(p){
d = p[3]*dnorm(x,mean=p[1],sd=p[2]) + p[6]*dnorm(x,mean=p[4],sd=p[5])
sum((d-y)^2)
}
optim(c(mu1,sig1,scale1,mu2,sig2,scale2),f,...)
}
Fit with initial params from the first fit, and an eyeballed guess of the second peak. Need to increase the max iterations:
> fit2P = fit2G(data$V3,data$V6,6,.6,.02,8.3,0.10,.002,control=list(maxit=10000))
Warning messages:
1: In dnorm(x, mean = p[1], sd = p[2]) : NaNs produced
2: In dnorm(x, mean = p[4], sd = p[5]) : NaNs produced
3: In dnorm(x, mean = p[4], sd = p[5]) : NaNs produced
> fit2P
$par
[1] 6.035610393 0.653149616 0.023744876 8.317215066 0.107767881 0.002055287
What does this all look like?
> plot(data$V3,data$V6)
> p = fit2P$par
> lines(data$V3,p[3]*dnorm(data$V3,p[1],p[2]))
> lines(data$V3,p[6]*dnorm(data$V3,p[4],p[5]),col=2)
However I would be wary about statistical inference about your function parameters...
The warning messages produced are probably due to the sd parameter going negative. You can fix this and also get a quicker convergence by using L-BFGS-B and setting a lower bound:
> fit2P = fit2G(data$V3,data$V6,6,.6,.02,8.3,0.10,.002,control=list(maxit=10000),method="L-BFGS-B",lower=c(0,0,0,0,0,0))
> fit2P
$par
[1] 6.03564202 0.65302676 0.02374196 8.31424025 0.11117534 0.00208724
As pointed out, sensitivity to initial values is always a problem with curve fitting things like this.
Fitting a Gaussian:
# your data
set.seed(0)
data <- c(rnorm(100,0,1), 10, 11)
# find & remove outliers
outliers <- boxplot(data)$out
data <- setdiff(data, outliers)
# fitting a Gaussian
mu <- mean(data)
sigma <- sd(data)
# testing the fit, check the p-value
reference.data <- rnorm(length(data), mu, sigma)
ks.test(reference.data, data)

Resources