Optimising weights in R (Portfolio) - r

I tried several packages in R and I am really lost in which one I should be using. I just need help in general direction and I can find my way myself for the exact code.
I am trying portfolio optimization in R. I need weights vector to be calculated where each weight in the vector represents percentage of that stock.
Given the weights, I calculate total return, variance and sharpe ratio (function of return and variance).
There could be constraints like total weights should be equal to 1 (100%) and may be some others on case by case basis.
I am trying to get my code to be flexible that I can optimize with different objectives (one at a time though). For example, I could want minimum variance in one simulation or maximum return in other and even max. sharpe ration in other.
This is pretty straight forward in excel with solver package. Once I have formulas entered, whichever cell I pick for objective function, it will calculate weights based on that and then calculate other parameters based on those weights. (Eg, if I optimize based on min variance, then it calculate weights for min variance and then calculate return and sharpe based on those weights).
I am wondering how to go about it in R? I am lost in reading documetation of several R packages or functions (Lpsolve, Optim, constrOptim, portfoiloAnalytics, etc) but not able to find the starting point. My specific questions are
Which would be the right R package for this kind of analysis?
Do I need to define separate functions for each possible objective, like variance, return and sharpe and optimize those functions? This is little tricky because sharpe depends on variance and returns. So if I want to optimize sharpe functions, then do I need to nest it within the variance and return functions?
I just need some ideas on how to start and I can give it a try. If I at least get the right package and right example to use, it would be great. I searched a lot on the web but I am really lost.

Related

meta-analysis multiple outcome variables

As you might be able to tell from the sample of my dataset, it contains a lot of dependency, with each study providing multiple outcomes for the construct I am looking at. I was planning on using the metacor library because I only have information about the sample size but not variance. However, all methods I came across to that deal with dependency such as the package rubometa use variance (I know some people average the effect size for the study but I read that tends to produce larger error rates). Do you know if there is an equivalent package that uses only sample size or is it mathematically impossible to determine the weights without it?
Please note that I am a student, no expert.
You could use the escalc function of the metafor package to calculate variances for each effect size. In the case of correlations, it only needs the raw correlation coefficients and the corresponding sample sizes.
See section "Outcome Measures for Variable Association" # https://www.rdocumentation.org/packages/metafor/versions/2.1-0/topics/escalc

in R, does a "goodness of fit" value exist for vegan's CCA, similar to NMDS's "stress" value?

I would like to know if there is a way to extract something similar to the metaMDS "stress" value from a vegan cca object? I've tried the goodness.cca function and its relatives
(http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/goodness.cca.html)
They tell me about the stats per sample, but I'm interested in the overall goodness of fit for reducing a multidimensional system to two dimensions (if something like that exists, as it uses different calculations).
I would like to continue with vegan, if possible, though I found this link here:
(Goodness of fit in CCA in R)
Thanks a alot
RJ
It is called eigenvalue. Earlier people have complained that NMDS does not have something like eigenvalue, but only has stress. The total variation in the data is the sum of all eigenvalues so that proportion of the eigenvalue or the cumulative sum of eigenvalues from the total is proportion explained. All these can be extracted with eigenvals() and its summary() (see their help with ?eigenvals). The significance tests for axes are available with its anova.cca() function (look at its documentation for references).
The web page you referred to is about another method, canonical correlations. In vegan we have that in CCorA with permutation test. Just pick your method, and then find your tools.

1 sample t-test from summarized data in R

I can perform a 1 sample t-test in R with the t.test command. This requires actual sets of data. I can't use summary statistics (sample size, sample mean, standard deviation). I can work around this utilizing the BSDA package. But are there any other ways to accomplish this 1-sample-T in R without the BSDA pacakage?
Many ways. I'll list a few:
directly calculate the p-value by computing the statistic and calling pt with that and the df as arguments, as commenters suggest above (it can be done with a single short line in R - ekstroem shows the two-tailed test case; for the one tailed case you wouldn't double it)
alternatively, if it's something you need a lot, you could convert that into a nice robust function, even adding in tests against non-zero mu and confidence intervals if you like. Presumably if you go this route you'' want to take advantage of the functionality built around the htest class
(code and even a reasonably complete function can be found in the answers to this stats.SE question.)
If samples are not huge (smaller than a few million, say), you can simulate data with the exact same mean and standard deviation and call the ordinary t.test function. If m and s and n are the mean, sd and sample size, t.test(scale(rnorm(n))*s+m) should do (it doesn't matter what distribution you use, so runif would suffice). Note the importance of calling scale there. This makes it easy to change your alternative or get a CI without writing more code, but it wouldn't be suitable if you had millions of observations and needed to do it more than a couple of times.
call a function in a different package that will calculate it -- there's at least one or two other such packages (you don't make it clear whether using BSDA was a problem or whether you wanted to avoid packages altogether)

Use a KNN-regression algorithm in R

I am working on using the k nearest neighbours with a certain variable identified(test) for determining the value of this same variable of an individual with this value non-identified(test). Two possible approaches can be done then:
first(easy one), calculate the mean value of the variable of the k individuals; second(best one), calculate a weighted distance value according to the proximity of the individuals.
My first approach has been using the knn.index function in FNN package for identifying the nearest neighbours, and then using the indexes, look for the values in the dataset to do the mean. This was working so slow, as the dataset is quite big. Is there any algorithm already implemented to do this calculation faster, and would it be possible to add weights according to distance?
After a week of trying to solve the problem, I found a function in R which was solving my question, this might help others who have strugled with the same issue.
The function is named kknn, and it is in the package KKNN. It lets you do a KNN regression, but weigthing the points by the distance.

Nonlinear regression / Curve fitting with L-infinity norm

I am looking into time series data compression at the moment.
The idea is to fit a curve on a time series of n points so that the maximum deviation on any of the points is not greater than a given threshold. In other words, none of the values that the curve takes at the points where the time series is defined, should be "further away" than a certain threshold from the actual values.
Till now I have found out how to do nonlinear regression using the least squares estimation method in R (nls function) and other languages, but I haven't found any packages that implement nonlinear regression with the L-infinity norm.
I have found literature on the subject:
http://www.jstor.org/discover/10.2307/2006101?uid=3737864&uid=2&uid=4&sid=21100693651721
or
http://www.dtic.mil/dtic/tr/fulltext/u2/a080454.pdf
I could try to implement this in R for instance, but I first looking to see if this hasn't already been done and that I could maybe reuse it.
I have found a solution that I don't believe to be "very scientific": I use nonlinear least squares regression to find the starting values of the parameters which I subsequently use as starting points in the R "optim" function that minimizes the maximum deviation of the curve from the actual points.
Any help would be appreciated. The idea is to be able to find out if this type of curve-fitting is possible on a given time series sequence and to determine the parameters that allow it.
I hope there are other people that have already encountered this problem out there and that could help me ;)
Thank you.

Resources